DOCUMENT RESUME 



ED 326 649 



CE 056 395 



TITLE 



INSTITUTION 

REPORT NO 
PUB DATE 
NOTE 

AVAILABLE FROM 



PUB TYPE 



Military Training. Its Effectiveness for Technical 
Specialties Is Unknown. Report to the Secretciry of 
Defense. 

General Accounting Office, Washington, DC. Program 

Evaluation and Methodology Div. 

GAO/PEMD-91-4 

Oct 90 

106p. 

U.S. General Accounting Office, P.O. Box 6015, 
Gaithersburg, MD 20877 (fir/3t five copies free; 
additional copies $2.00 eacl>; 100 or more — 25% 
discount) . 

Reports - Evaluative/Feasibility (142) — 



EDRS PRICE MF01/PC05 Plus Postage. 

DESCRIPTORS Armed Forces; Data Collection; ^Evaluation Criteria; 

^Evaluation Methods; Evaluation Needs; ^Evaluation 
Problems; Evaluation Utilization; Females; Job 
Performance; ^Military Training; Minority Groups; 
^Outcomes of Education; Post secondary Education; 
*Program Effectiveness; Program Evaluation; 
Reliability; Technical Education; Validity 



ABSTRACT 

A study examined the information collected by the 
Department of Defense on both the quality of its new recruits and the 
effectiveness of its training in preparing recruits to operate in a 
technclogically sophisticated environment. It found that data were 
collected at a recruit's en ranee to military life, during and upon 
completion of formal training, and after assignment to a military 
specialty in the field. The study showed that for most recruits, the 
services' selection criteria are moderately successful at predicting 
individual performance during classroom technical training. However, 
they ara notably less successful for women and minority recruits. 
Each service has evaluation mechanisms in place, but only the Army 
systematically collects data on the field performance of individual 
graduates in a way that would allow comparison of a graduate's 
on-the-job performance with entry-level ability and classroom 
performance. These data reveal an evei? weaker connection for women 
and minority group members between criteria used to assign them to 
technical specialties and their later field performance. The field 
evaluation practices of the Navy are particularly fragmented and have 
deteriorated during the 1980s. The lack of reliable field performance 
data in the Navy and the Air Force makes realistic assessment of 
training effectiveness impossible. The study also found that the 
quality of recruits in respect to education increased during the 
early 1980s, but leveled off later in the decade. (KC) 



* fcfcfifi fife fcfc felt fcf: fcfcfcfc fcfcfcfc fcfcfcfcfc fc fcfcfcfcfcfc fc it i(fc It It felt It It felt It It It 

* Reproductions supplied by EDRS are the best that ceji be made 

* from the original document. 



JJrfited States Gehefal Accounting Offi<|e 

Repbrt to the Secretary of Defense 



October 1990 



' MILITARY TRAINB^G 

Its ^f fectiv^ess for 
: Technical Specialties Is 
Unknown 



GAO/PfiMD-91-4 




GAO 



United States 

General Accounting Office 

Washington, D-C. 20548 



Program Evaluation and 
Methodology Division 

B-239914 

October 16, 1990 

The Honorable Richard B. Cheney 
The Secretary of Defense 

Dear Mr. Secretary: 

In this report, we review the information sources on which the services base their 
evaluations of the effectiveness of their technical training programs, recruit selection, and 
classification decisions. We undertook this review because the technical sophistication of 
modern weaponry has intensified the need for well-qualified recruits and effective technical 
training. This report identifies some critical gaps in the services' ability to measure how 
effectively they are selecting and preparing recruits to use and maintain today's complex 
weapons systems. 

This report contains recommendations in Chapter 5. The head of a federal agency is required 
by 31 U.S.C. 720 to submit a written statement on actions taken on these recommendations to 
the Senate Committee on Governmental Affairs and the House Committee on Government 
Operations not later than 60 days after the date of the report and to the House and Senate 
Committees on Appropriations with the agency's first reonest for appropriations made more 
than 60 days after the date of the report. 

We are sending copies of this report to appropriate House and Senate committees, members 
of Congress from the states mentioned in the report, and the Director of the Office of 
Management and Budget. We will also make ropies available to interested organizations, as 
appropriate, and to others upon request. 

If you have any questions or would like additional information, please call me at (202) 275- 
1854. Major contributors to the report are listed in appendix VI. 

Sincerely yours. 




Eleanor Chelimsky 
Assistant Comptroller General 



Executive Summary 



The ability of the armed forces to carry out their mission into the next 
century will depend on both hardware and personnel considerations: the 
reliability and appropriateness of weapons systems, the quality of mili- 
tary personnel, and the "fit" of human skills to the operating demands 
of weapons systems. If the entry-level aptitude, knowledge, end skills of 
new recruits should fall shon of the human requirements needed to 
operate and maintain new technologically sophisticated systems, greater 
demands would be placed on the armed services to compensate for the 
shortfall through training. The purpose of this report was to examine 
the information collected by the Department of Defense (dod) on both 
the quality of its new recruits and the effectiveness of its training in 
preparing recruits to operate in a technologically sophisticated 
environment. 



ck0rn\ in c\ recruit is admitted to military service and assigned to an occupational 

IDciCKgl UlLllU specialty on the basis of tests taken at recruitment. Upon completion of 

basic training, most recruits receive additional classroom training in 
their specialty and then are assigned to perform the specialty in the 
field. This typical sequence encompasses the three points in a recruit's 
service career where data critical to evaluating the success of training 
must be collected: at entrance to military life, during and upon comple- 
tion of formal training, and after assignment to a military specialty in 
the field. 



An adequate system of assessing training effectiveness must include 
reliable and valid information at each of these points, and should 
examine the interrelationships among these data points to test the con- 
gruence of initial selection and placement data, cla^'^room measures, and 
the ultimate criterion — field performance. 



During the mid-1980's, the services reported dramatic improvements in 
the general qualifications of new rec. aits. The improvements were 
attributed to better compensation and educational benefits, increased 
recruiting efforts, and heightened public appreciation of the military 
role. These reports did not, however, address the specific area of tech- 
nical qualifications among recruits. More recently, the services have 
reported difficulty in filling their quotas with highly qualified recruits. 
This perceived decline in the ability levels of recruits entering training 
raises questions about the reality of that decline, about its magnitude, 
about the effectiveness of the process* by which recruits are selected for 
training, and about the actual on-the-job performance of those recruits. 



4 



o 

ERIC 



Pagf • 2 



GAO/PEMD-91-4 Military Technical-Training Effectiveness Is Unknown 



Executive Summary 



Results in Brief 



GAO found that the aptitude level of recruits did increase during the 
1980's but that most o*" the improvement occurred during the first half 
of the decade. Since then, little change has occurred in general aptitude 
for training, but the levels of some of the more technical skills have 
declined among recruits, in one case below the 1981 level. Women and 
members of minority groups consistently scored lower in tests used to 
assign recruits to more technical occupational specialties such as radar 
specialist positions. 

GAO concluded that, for most recruits, the services' selection critoria are 
moderately successful at predicting individual performance during 
classroom technical training. However, they are notably less successful 
for women and minority recruits. 

Each service has evaluation mechanisms in place, but only the A* v 
systematically collects data on the field performance of individual grad- 
uates in a way that would allow comparison of a graduate's on-the-job 
performance with his or her entry-level ability and classroom perform- 
ance. These data reveal an even weaker connection for women and 
minority group members between criteria used to assign them to tech- 
nical specialties and their later field performance. The field evaluation 
practices of the Navy are particularly fragmented and have deteriorated^ 
during the 1980's. gag found that the lack of reliable field performance 
data in the Navy and the Air Force makes realistic assessment of 
training effectiveness impossible. 

gag concluded that the insensitivity of selection and placement mea- 
sures as predictors of future success for female and minority recruits is 
a matter of serious concern in view of the military's increasing reliance 
on these groups to perform technical roles. 



Principal Findings 



Recent Quality Trends 



All services administer the Armed Services Vocational Apti: ide Battery 
(ASVAn) to new recruits. The primary measure of a recruit's aptitude is 
the Armed Forces Qualification Test (afqt), which is made up of four 
asvab subtests, afqt scores have tended to level off after rising in the 
early 1980's. Average scores on three of the four subtests used to select 
candidates for technical training have declined since mid-decade, and 
scores on one — the Electronics Information subtest — are lower than in 



ERIC 



Page 3 



GAO/PEMD-91-4 MUitary Te***- ical-Training Effectiveness Is Unkn'^wn 



Executive Summary 



1981. A smaller percentage of recruits now qualify for the most 
demanding technical specialties than at any time since 1981. Women and 
minority group members are severely underrepresented among quali- 
fiers because they score lower, on average, than white males. (See pages 
18-31.) 



Classroom Evaluation ^^^^ sei-vice has established evaluation mechanisms to monitor instruc- 

1^ tional quality and curriculum coverage in classroom training. Overall, 

iviea..ures grading procedures in the courses gao reviewed appeared to discrim- 

inate acceptably well among levels of student performance (with the 
exception of some Army courses where recorded grades were unreliable 
indicators of classroom performance). (See pages 32-34, 36-38, and 40- 
41.) 

Selection criteria from asvab are moderately successful in predicting the 
performance of most students for training, but are significantly less reli- 
able predictors for women and minority students. While these groups 
appeared to overcome their lower scores on aptitude measures in the 
Navy and Air Force courses r>3viewed, the differences in classroom per- 
formance fofnonwhite and female students persisted throughout the 
Army technical coursesTe viewed. (See pages 34-36, 38-39, and 40-41.) 

GAO developed a statistically more sophisticated summary score from 
ASVAB using factor analysis. This factor score generally performed better 
than AFQT and the Electronics Composite score in predicting final grades 
for all demographic groupings. This finding suggests that broader-based, 
selection criteria than those currently in use could be more reliable 
predictors of classroom performance, at least in the technical areas gag 
reviewed. (See pages 36, 39, and 41.) 



Field Measures of Training '^^^ Army's Skill Qualification Test provides the only objective, system- 
E ''fecti veness atically collected estimates of the field performance of individual gradu- 

ates of training. The Air Force and the Navy rely instead largely on 
feedback mechanisms through which field commanders and supervisors 
may submit complaints to the training community if they believe their 
graduates have been inadequately trained. In addition. Air Force evalua- 
tion units periodically survey a sample of supervisors of course gradu- 
ates for their perceptions of the quality and appropriateness of training. 
A similar practice was followed in the Navy until the mid-1980's. 
Internal reports have been sharply critical of the quality of the Navy's 



Page 4 



6 

G AO/PEMD-SM Military Technical-Training Effectiveness Is Unknown 



Executive Summary 



training assessment procedures, but these deficiencies are only slowly 
being corrected. (See pages 45-50.) 

Field performance measures have been developed by dod un Jer the 
Joint-Service Job Performance Measurement project and may be appli- 
cable to training assessment purposes. (See page 51.) 

ASVAB scores in our sample are weaker predictors of field performance as 
measured by the Army than they are of classroom performance and 
only predict well for white male recruits. The factor scores developed by 
GAO are better predictors than either afqt or the Electronics qualifying 
scores used by the Army. No asvab score was significantly correlated 
with field performance for women or minority soldiers. (See pages 45- 
46.) 



GAG believes that evaluating the effectiveness of the training provided 
by the services is crucial if they are to meet the future challenges of 
changing demographics and increasingly sophisticated weaponry, gao 
therefore recommends that the Assistant Secretary of Defense for Force 
Management and Personnel attempt to develop more sensitive indicators 
of classroom and field performance in tecLnical specialties for women 
and minority recruits from extant data, gao also recommends that the 
Assistant Secretary review alternative measures of field performance 
already developed by the services under the Job Performance Measure- 
ment project for their applicability to training and on-the-job perform- 
ance evaluation, gao further recommends that the Secretary of the 
Army direct the Training and Doctrine Command to review for accu- 
racy, appropriateness, and reliability the classroom grading procedures 
identified within the report as deficient. Finally, gao recommends that 
the Secretary of the Navy establish a firm deadline for developing a 
training evaluation program ana that he direct that current resources 
allocated to this effort be reexamined for their adequacy. 



In a written response to a draft of this report, dod concurred with all of 
its recommendations and identified specific actions to be taken toward 
implementing them, dod also concurred or partially concurred with what 
it identified as the main-findings contained in the report. (See appendix 
V.) We have reviewed these comments and, where^appropriate, have 
made changes to the text. 



7 

Q Page 5 GAO/PEMD-91-4 Military Technical-Training Eftfcctiveness Is Unknovm 

ERIC 



Recommendations 



Agency Comments 



Contents 



Executive Summary 

Chapter 1 
Introduction 



Chapter 2 
The Quality of 
Military Recruits: 
1981-89 

Chapter 3 

Classroom Measures of 
Training Effectiveness 

Chapter 4 

Field Measures of 

Training Effectiveness 



Chapter 5 
Summary, 
Recommenaations, 
and Af^ency Comments 
and Our Response 



Recruit Quality in the 1980's 
Recruit Training 

Objectives, Scope, and Metiiodology 
Strcngtiis and Limitations of Our Study 



Armed Services Vocational Aptitude Battery (ASVAB) 
Summary and Conclusions 



Army 
Navy 
Air Force 

Summary and Conclusions 



Arm.y 
Navy 
Air Force 

Alternative Data Sources: The Job Performance 

Measurement Project 
Summary and Conclusions 



Summary 
Recommendations 

Agency Comments and Our Response 



10 
10 
11 
13 
17 



18 
18 
30 



32 
33 
36 
39 
42 



45 
45 
48 
50 
51 

52 



53 
53 
54 
55 



8 



PageO 



GAO/PEMD-91-4 Military Tecluiical-Tralning Effectiveness Is Un)rj)Owti 



Contents 



Appendixes 


Appendix I: AFQT Mean Score and Electronics Composite 
Summary Statistics: 1981-89 


60 




Appendix II: Predictor and Criterion Variable Mean 


64 




Scores 






Appendix IIL Intercorrelation of Study Variables by 


66 




Occupational Specialty 






Appendix IV: Army SQT Mean Scores, by Occupational 


77 




Specialty 






Appendix V: Comments From the Department of Defense 


78 




Appendix VI: M^uor Contributors to This Report 


103 




Tab.e 1.1: How AFQT Test Results Are Categorized 


15 




Table o.l: Army Occupational Specialties Reviewed 


33 




Table 3.2: Mean Scores on Predictoi and Criterion 


34 




Variables, Army 






Table 3.3: Intercorrelation of Study Variables, Army 


35 




Table 3.4: Occupational Specialties Reviewed, Navy 


37 




Table 3.5: Mean Scores on Predictor and Criterion 


37 




Variables, Navy 






Table 3,6: intercorrelation of Study Variables, Navy 


39 




Table 3.7: Occupational Specialties Reviewed, Air Force 


40 




Table 3.8: Mean Scores on Predictor and Criterion 


40 




Variables, Air Force 






1 able 3.9: Intercorrelation of Study Variables, Air Force 


42 




Table 4.1: Correlation of SQT and Predictor Variables 


A 

46 




Table I.l: ArQT Mean Scores, by Gender 


60 




Table 1.2: ArQl Mean Scores, by Service 


60 




Table 1.3: AFQT Mean Scores, by Race/Ethnicity 


61 




Table 1.4: AFQT Mean Score Overall Totals 


61 




Table 1.5: Electronics Composite Mean Scores, by Gender 


62 




Table 1.6: Electronics Composite Mean Scores, by Service 


62 




Table 1.7: Electronics Composite Mean Scores, by Race/ 


63 




Ethnicity 






Table 1.8: Electronics Composite Mean Score Overall 


63 




Totals 






Table II. 1: Army Mean Scores 


64 




Table II. 2: Navy Mean Scores 


64 




Table II.3: Air Force Mean Scores 


65 




lable III.l: Intercorrelation of Study Variables: Army, 


66 




24J 

9 





^ Page 7 GAO/PEMD.91-4 Alllltary Technical-Training Effectiveness Is Unknown 

ERIC 



Contents 



Table III.2: Intercorrelation of Study Variables: Army, 


67 


27N 




Table III.3: Intercorrelation of Study Variables: Army, 


68 


29V 




Table III.4: Intercorrelation of Study Variables: Navy, AQ 


69 


Table III.5: Intercorrelation of Study Variables: Navy, AX 


70 


Table III.6: Intercorrelation of Study Variables: Navy, 


71 


STG 




Table III.7: Intercorrelation of Study Variables: Navy, STS 


72 


Table III.8: Intercorrelation of Study Variables: Air Force, 


73 


45530A 




Table III.9: Intercorrelation of Study Variables: Air Force, 


74 


45530B 




Table III.IO: Intercorrelation of Study Variables: Air 


75 


Force, 30332 




Table III. 11: Intercorrelation of Study Variables: Air 


76 



Force, 30333 

P 



Figures 



Figure 1.1: Recruit Training Process 


12 


Figure 1.2: Data Sources and Comparisons 


14 


Figure 2.1: Mean AFQT Scores, by Gender: 1981-89 


10 


Figure 2.2: Mean AFQT Scores, by Race/ Ethnic/ty: 1981- 
89 

Figure 2.3: Mean AFQT Scores, by Service: 1981-89 


20 


21 


Figure 2.4: Mean AFQT Subtest Scores, 1981-89 


22 


Figure 2.5: Mean Electronics Composite Scores, by 


23 


Gender: 1981-89 




Figure 2.6: Mean Electronics Composite Scores, by Race/ 


24 


Ethnicity: 1981-89 




Figure 2.7: Mean Electronics Composite Scores, by 


25 


Service: 1981-89 




Figure 2.8: Mean Electronics Composite Subtest Scores, 


26 


1981-89 




Figure 2.9: Number of Recaiits Qualifying for Training as 


27 


Control and W?. ning Radar Specialists, 1981-89 




Figure 2.10: Percent of Recruits QualifyiHi^ for Training 


28 


as Control and Warning Radar Specialists, 1981-89 




Figure 2.1 1: Number of Recants Qualifying for Training 


29 


as Systems Repair Technicians, 1981-89 




Figure 2.12: Percent of Recruits Qualifying for Training 


30 


as Systems Repair Technicians, 1981-89 




10 





ERIC 



Pages 



GA0/PEMI>91-4 Military Technlcal-Training Effectiveness Is Unknown 



Contents 



Abbreviations 



AFQT Armed Forces Qualification Test 

ASVAB Armed Services Vocational Aptitude Battery 

DOD Department of Defense 

FLETAP Fleer. Training Assessment Program 

GAG General Accounting Office 

ISD Instructional System Development 

JPM Job Performance Measurement 

NTSC Naval Training Systems Center 

SQT Skill Qualification Test 

TAST Training Assessment Survey Team 



erJc 



Page 9 



11 



GAO/PEMD-91-4 Military Techi.lad-Tnilniiig Effecdveness Is Unknown 



Introduction 



The ability of the armed forces to carry out their mission into the next 
century will depend on both hardware and personnel considerations: the 
reliability and ^ppropriatenesn of weapons systems, the quality of mili- 
taiy personnel, and the 'Tit" of human skills to the operating demands 
of weapons systems. If the entry level aptitude, knowledge, and skills of 
new recruits should fall short of the human requirements needed to 
operate and maintain new technologically sophisticated weapons sys- 
tems, greater demands would be placed on the anned services to com- 
pensate for the shortfall through training. In this' report, we will 
examine the information collected by dod on both the quality of its new 
recruits and the effectiveness of its training in preparing recruits to 
operate i^i a technologically sophisticated military environment. 



In .hearings before the House Appropriations Committee on the fis» a) 
year 1^88 budget for dod, the Assistant Secretary for Force Manage- 
ment and Personnel characterized the changes since 1980 in tlie natioirs 
armed forces in these words: **Today we are recniiting tlie liigliest 
quality personnel in histoxy. (The i,erv.ces' pei-sonnel possess). . . high 
intelligence, correct experience mix, (and) high skill levels." The rejisons 
cited for this "most remarkable turnaround in peacetime liistory" were 
many; higher pay and improved quality of life for members of the 
armed forces; the recession and consequent unemployment of the carl;' 
I980's, which widened the pool of aiyplicants; improved educational 
benefits for military service; more intensi>'0 and effective recniiting; 
and recovery from the poor public perception of the militaiy following 
the war in Vietnam. 

The statistics cited by dod supported this favorable view. In 1980 68 
percent of reotiiits were high school graduates (versus 75 percent for 
the youth population in general). By 198G, 92 percent of recruits hud 
high school diplomas. Whereas 65 percent of recruits in 1980 scored in 
the top three mental categories on the Armed Forces QunHHcation Test 
(versus 69 percent for the norm group), in 1986, 96 percent achieved 
this level. 

Yet the demographic and educational realities of the immediate future 
are likely to affect thi? jptimistic scenario. The number of young people 
available lor the military recruit pool will continue to diminish until the 



Recruit Quality in the 
1980's 



12 



ERiC 



Page 10 



GAO/PEMD-91-4 Military Technical-Tralnli?g Effectlvenww U Unknow-n 



Chapter 1 
InU'oduction 



mid-l990's.* The composition of the recruit pool will also shift. 
According to research sponsored by the Department of Labor, by the 
year 2000 five of eveiy six new labor force entrants will be female, 
minority group members, or immigrants.^ Meanwhile, the graduates of 
the American educational system are said to be falling further behind 
the youth of competitor nations in technological literacy at the same 
time that U.S. weapons systems are becoming increasingly 
sophisticated.^ 

DOD has also begun to voice concern. Hints of uneasiness emerged in the 
fiscal year 1988 appropriations hearings when the Air Force reported 
increased difticulty in securing quality recruits. In the same hearings, 
the Navy expressed its concern over the steady erosion of its Del' yed 
Entry Pool — the program under which applicants agree to enter the ser- 
vice within a year. In addition, for the first time in eight years, the 
Army failed to meet its quarterly recruiting quota in the first quarter of 
fiscal year 1989. 



Rppniit Training Figure l.l identifies the typical sequence that occurs during the early 

till 5 stages of a recruic's time in the :ailitary. As shown, after their basic 

training — the length and content of which varies by service — most 
recruits attend additional training to equip them to function effectively 
in some occupational specialty. The recruit's area of specialization is 
determined by service needs, qualifications as determined on tests 
administered during the recruiting process, and individual interests. 



* vs. Hurcau t!*c Census, Projections of the Population of the Unucd States, by Age. Sc.x, and Race: 
1988 to 20S0 . Current Population Reports, Scries l'-25, No. 1018 (Washington, D.C.. U.S. Government 
Pnnting Olficc, 1989), p. 6. 

-William B. Johnston and Arnold II Packer, Workforce 2000: Work and Workers for the 21st Century 
(Indianapolis, Indiana. Hudson Institute, 1987), p.95. Sec also U.S. Office of Personnel Management, 
Civil Service 2000 (Washington, D.C.: U.S. Government Printing Office, 1988). 

^Martin Dinkin, Military Tcchnolojiy and Defense Manpower (Wasnington, D.C.: The Brookings Insti- 
tution, 19CGX Sec also Aerospace Education Foundation, America's Next Cnsis: The Shortfall in Tech* 
nical Manpower (Arlington, Va.: The Aerospace Education Foundation, 1989^ and iXaiional Research 
Council, A Challenge in Numbers: People in the Matliematical Sciences Washington, D.C: National 
Academy of Sciences, 1990). 

13 



ERIC 



Page 11 



GA0/PEMD-9M Military Technical-Training Effectiveness Is Unknown 



Chapter 1 
Introduction 



Figure 1.1: Recruit Training Process 




The training curriculum for each occupational specialty is designed 
through a structured set of procedures called Instructional System 
Development (isd) that draws heavily on the work by Tyler and others 
on the behavioral objectives of instruction.'* The isd model consists of the 
following five steps: 

L Determine job requirements through detailed analysis of tasks per- 
formed in an occupational specialty. 

2. Determine type of instruction (formal classroom, on-the-job, or other) 
that best suits the student population and task requirements. 



''See, for example, R.W. Tyler, Casic Principles of Curriculum and Instruction (Chicago: University of 
Chicago Press, 1950); and R. W. Tyler, R.M. Gagne, and M. Scriven. Perspectives of Curriculum Evalu- 
ation (Chicago: Rand McKally, 1967). 



o 

ERIC 



Page 12 



GAO/PEMD-91-4 MUitary Technical-Training Effectiveness Is Unknown 



Chapter 1 
Introduction 



3, Develop objectives that specify the desired behaviors, the conditions 
under which they are to be demonstrated, and an acceptable standard of 
performance. 

4, Plan and develop instructional methods, media, and equipment. 

5, Conduct and evaluate instruction. 

A student's progress through an iSD-developed curriculum is measured 
by criterion-referenced tests at the end of each block of training. A stu- 
dent passes thp course after he or she has performed each task identi- 
fied as a job requirement at the level of competency defined as 
acceptable. Continuous monitoring of job requirements is needed to 
assure that course objectives remain relevant. 

Upon successful completion of classroom training in the occupational 
specialty, the recruit is ready for assignment in the field to carry out the 
duties requiring the skills acquired during training. Formal training is 
now complemented by the necessary on-the-job training to permit the 
recruit to function as part of a unit with a defined mission in a real- 
world setting. 



The purpose of our study is twofold: to profile the aptitudes of the 
recruits who entered the service from 1981 to 1989, and to evaluate the 
military service's ability to select successful trainees and to assess their 
training and work performance. We will examine the three points in a 
recruit's service career where data critical to performing a thorough 
evaluation of training must be collected: (1) at entrance to military life, 
prior to assignment to an occupational specialty; (2) during training, 
when the recruit's mastery of the specialty's basics is assessed; and (3) 
after assignment to the field, where what was learned in the classroom 
must be applied in the work environment, (See figure 1.2.) 



Objectives, Scope, and 
Methodology 



ERLC 



Page 13 



.15 

GAO/PEMD-91-4 Military Technical-Training Effectiveness Is Unknown 



Chapter 1 
Introduction 






Comparisons Test the 
Effectiveness of 




Classroom Training 




The evaluation model underlying our review assumes the need to inter- 
relate these three points. Comparing the information collected at points 
1 and 2 can provide some insight into the ability of the services to pre- 
dict how well recruits will perform in training on the basis of their 
scores in qualifying tests. The strength of the relationship between 
points 2 and 3 is a partial measure of the validity and effectiveness of 
training. Finally, the relationship between points 1 and 3 is an estimate 
of the effectiveness of the services' selection and training procedures. 

1 le model is, of course, simplistic and in need of considerable expan- 
sion. A fully detailed model would have to consider other influences on 
performance, such as on-the-job experiences, and would need to be able 
to determine the location of a problem if relationships between the three 

16 



ERIC 



Page 14 



GAO/PEMD'91-4 Military Technical-Training Effectiveness Is Unknown 



Chapter 1 
Introduction 



points were weaker than anticipated. Yet, the model, at whatever level 
of sophistication, wou)d at a minimum require data at these three crit- 
ical points in a recruit's service career. 

We reviewed the information collection practices of each service at the 
three points identified in the model. For a selected number of occupa- 
tional specialties — our focus is on training for the more technical occu- 
pational specialties — we reviewed the data that have been collected for 
insights they provide into the service's selection and evaluation proce- 
dures, particularly as they affect women and minority groups. 

Our study is organized around three evaluation questions, each corre- 
sponding to one of the model data points. Each question is addressed in 
a separate chapter. 

1. How has the aptitude of recruits for technologically sophisticated spe- 
cialties changed since 1980? 

DOD tracks recruit aptitude according to four broad mental categories 
based on the scores on the Armed Forces Qualification Test (afqt). (See 
table 1.1.) AFQT is a composite of four of the ten tests from the Armed 
Services Vocational Aptitude Battery (asvab) administered to eveiy 
potential recruit We examined some other components of asvab in 
greater detail, paiticulariy those subtests that are used to qualify candi- 
dates for high t^^ehnology occupational specialties. 



Table 1.1: How AFQT Test Results Are 
Categorized 



AFQT category 


AFQT percontile 
score 


Trainability 


1 


93-99 


Well above average 


II 


65-32 


Above average 


IIIA 


50-64 


Average 


IIIB 


31-49 


Average 


IV 


10-30 


Below average 


va 


1-9 


Well below average 



^Category V examinees are excluded by law from military service. 



2. How useful are the data collected by the services before and during 
classroom training for selecting individuals for high technology roles 
and for evaluating the effectiveness of this training? 

We examined the measures of recruit performance collected during 
training and assessed their utility for evaluating training effectiveness, 



17 



ERIC 



Page 15 GAO/PEMD-OM Military Technical-Training Effectiveness Is Unknown 



Chapter 1 
Introduction 



as well as for providing information on the validity of procedures used 
to assign recruits to training. 

3. How well do the services' selection criteria and training evaluation 
measures predict success in high technology roles? 

We examined the procedures used by each of the services to assess the 
impact of training on actual job performance. We also related these pro- 
cedures to the ASVAB scores used to select trainees and to classroom mea- 
sures of training success, in ordex' to estimate the predictive validity of 
ihese measures. 

In view of the demographic shifts projected for the labor force over the 
next decade, we provided separate answers to each of these questions, 
wherever possible and appropriate, for women and minorities. 

We defined high technology roles as those occupational specialties for 
which the services require a qualifying score in electronics substantially 
above the mean. For our review, we selected a sample of 13 such 
courses — five from the Army and four each from the Navy and the Air 
Force — from which we collected data on individual student perform- 
ance. Each of these courses is intended to provide a recruit the neces- 
sary introductory training to qualify as an apprentice in his specialty. 

In the course of our review, we interviewed officials responsible for 
training evaluation in the Office of the Secretary of Defense and within 
each of the three services. We visited four service training centers and 
the facilities maintained by each of the services for research into 
training and other personnel issues, as well as the Training Performance 
Data Center in the Office of the Secretary of Defense. Our final data 
base was compiled from information received from all of these sources, 
but our primary source for asvab and demographic data was the Defense 
Manpower Data Center. We also received information from the Center 
for Naval Analyses on technical adjustments to asvab validity estimates, 
and on the asvab norm group. This study was conducted in accordance 
with generally accepted government auditing standards. 



is 



ERIC 



Page 16 



GAO/PEMD-91-4 Alilltary TeOmical-Tralning Effectiveness Is Unknown 



Chapter 1 
Introduction 



StrenfithS and review of the quality trends among the 2.3 million recruits who 

J . .. .. n entered military service from 1981 to 1989 is more finely grained than 

Limitations Ol UUr the traditional counts of recruits in each of four mental categories rou- 

Study tinely reported to the Congress. We report the differences among racial 

groupings and between male and female recruits, and we examine dif- 
ferential trends among the various areas measured uy asvab. We 
assumed the reliability and validity of the widely researched asvab and 
its subtests and made no independent review of these factors. However, 
we did develop an independent scoring procedure for asvab that sug- 
gests an alternative, and apparently more valid, approach to assigning 
recruits to occupational specialties. 

The intent of our review of classroom grades and other evaluation mea- 
sures was to identify the major sources of training evaluation informa- 
tion now in place in the services, and to make use of the objective data 
we collected to address some concerns about recent trends in recruit 
quality and the future composition of the recruit pool. 

Two important considerations about our sample of students limit any 
attempt to generalize our findings. First, we deliberately chose occupa- 
tional specialties for which the services rpquired above average mental 
qualifications. While the types of classroom measures employed in these 
courses would most likely be found in other courses with similar 
requirements, we can say little about the evaluation procedures for less 
demanding specialties. Second, in part because of the nature of the spe- 
cialties we chose, our sample contained relatively few members of 
minority groups and very few women. Thi.s fact limited the power of o. r 
statistical analysis of these subgroups, and allowed only first-level com- 
parisons (that is, white versus nonwhite; male versus female). Neverthe- 
less, even at this levei, we believe we have identified some important 
differences anr^ gaps in the available data for determining the success of 
training outcomes. These differences and gaps, together with other find- 
ings from our analyses, strongly suggest the need for further, more 
targeted evaluation of its training efforts by the military. 



ERIC 



19 

Page 17 GAO/PEMD-91-4 Military TechnlcalTntinUig Effectiveness Is Unknown 



Chapter 2 

The Quality of Military Recruits: 1981-89 



In 1980, there were 2.4 million more American youths aged 18-21 than 
there are today. This age group, which now numbers 15 million, will 
diminish to 13.5 million by the mid-l990's. This 15-year 22-percent 
decline in the population from which the all- volunteer force draws its 
new personnel must be a matter of concern to military recruiters. The 
concern is exacerbated when we consider the technological aptitude of 
the potential recruit pool: it appears that the graduates of our public 
schools are becoming less technologically literate when compared to 
their peers in other developed nations — and this decline is occurring just 
as our weapons systems are reaching new heights of technological 
sophistication. 

However, by the standards set by dod, the quality of military recruits in 
the first half of the I980's did not decline in proportion to the dwindling 
numbers in the recruit pool. As we have noted in the previous chapter, 
DOD reported "the most remarkable turnaround in peacetime history" 
betwe( n j 980 and 1986, with dramatic increases in the proportion of 
recruits who had graduated from high school and who scored in the top 
three afqt categories. 

In this chapter, v/e will address our first evaluation question: How has 
the aptitude of recruits for technologically sophisticated specialties 
changed since 1980? Our purpose is threefold: (1) to determine whether 
the quality gains as defined and reported by the services in the first half 
of the I980's are being maintained; (2) to expand the definition of 
quality to include other measures beyond those traditionally reported 
(that is, high ,school graduation and service-defined mental category); 
and (3) to examine in greater detail two occupational L^^cialties that, by 
service definition, require higher entry levels of technological sophisti- 
cation. We will report the trends we found in the scores achieved by 
recruits from fiscal year 1981 thrortgh fiscal year 1989 on some of the 
various subtests and composites of the Armed Services Vocational Apti- 
tude Battery (asvab), the instrument used by all services to both qualify 
applicants for entry and classify recruits into occupational specialties. 
We will examine in detail those scores that are used by the services to 
qualify recruits for more technologically demanding specialties. 



ASVAB is composed of ten subtests measuring abilities considered impor- 
tant for military service. Scores from asvab subtests are combined to 
form composite scores thought to be related to general types of occupa- 
tional specialties within the armed forces. While different services use 
different methods to combine subtest scores into composites, all services 

20 * 



Armed Services 
Vocational Aptitude 
Battery (ASVAB) 



Page 18 



GAO/PERn)-91-4 Military Technical-Training Effectiveness Is Unknown 



Chapter 2 

Hic Quality of Military Recruits: 1981-89 



use the same component subtests for two composite scores, the Armed 
Forces Qualificai on Test (afqt) and the Electronics Composite, We 
examined these i vo in detail to determine how they have changed 
during the 1980's, 



Arrn^d Forces 
Qualification Test (AFQT) 



An AVQT score is currently derived from a recruit's scores on four asvab 
subtests: Word Knowledge, Paragraph Comprehension, Arithmetic Rea- 
soning, and Mathematics Knowledge^ afqt scores are the primary 
mental criterion for entry into the armed services. Figure 2,1 displays 
the me^n composite afqt scores for men and women from 1981 through 
1989. Actual mean scores for this period may be found in appendix I, 



Figure 2.1: Mean AFQT Scores, by H 
Gender: 1981-89 

215 




200 



195 



1981 



1982 



1933 



1934 



1985 



1986 



1937 



1988 



1939 



— MALE 

mmmm FEMALE 

Note; AFQT scores were computed as the sum of standard scores on Arithmetic Reasoning and 
Mathematics Knowledge, plus the Verba! standard score times two. This is the formula used by DOD 
as of January 1. 1989. 

Soi'rce: Data are from the Defense Manpower Data Center. 



^Before 1989, APQT scores were computed differently. In order to moumam comp"*"»*)iIity, we com- 
puted AFXJT scores of all rccniits using tlie 1989 definition and the standard .subt es' *?cores provided 
by the Defense Mani)ower Data Center. 

21 

O ^ Page 19 GAO/PEMD-91-4 Military Tijchnical-Tininiiig Effectiveness Is Unknown 



ERIC 



Chapter 2 

The Quality of MiUury Recruits: 1981-89 



Ovei'all AFQT scores improved approximately eight points between 1981 
and 1989, This improvement occurred among both male and female 
recruits. However, despite fluctuations over the years, the scores of 
male recruits began and ended the decade slightly higher than female 
scores. Male scores continued to increase each year until 1988, although 
their rat<> of increase was greatest in the first four years. Female scores 
improved dramatically from 1981 to 1983 but then flattened out, so that 
by the end of the decade they wei'e lower than in any year since 1985. 

AFQT scores diffei'ed moi'G substantially across racial/ethnic groupings 
than between genders, (See figure 2.2.) White recruits began the decade 
with scores approximately 21 points higher than minority recruits. By 
1989, this difference had shrunk to 15 points. The bulk of the relative 
gain by minority recruits, however, had ^cc '^red by 1985, and any nar- 
rowing of this gap since then has been slight. 



Figure 2.2: Wean AFQT Scores, by Race/ 
Ethnicity: 1981-89 




1981 lSd2 1983 19C4 1985 1986 1987 1968 1969 



WHfTf: 

BLACK 
MM HISPANIC 
■ SKii OTHER 

Note: AFQT soDiGs weio computed as the sum o( standofd scores on Arithmetic Reasoning and 
Mathematics Knowledge, plus the Verbal standard score timeL two. This is the lormula used by DOD 
as o( January 1, 1989. 

Source: Data are from the Defense Manpower Data Center. 

22 



ERIC 



Page 20 



GAO/PEMD-91-4 ^^illta^y Technical-Training Effectiveness Is UnknowTi 



Chapter 2 

The Quality of Military RecruiU: 1981-89 



Mev^ AFQT scores in all services were significantly higher in 1989 than 
in 1981. (See figure 2,3,) Army recruits showed the greatest gain. 
Average Army scores were substantially lower than those of other ser- 
vices at the beginning of the decade, but by 1986 they had increased to 
approximately the same level as scores achieved by Navy and Marine 
recruits. Navy scores peaked in 1983 and have declined somewhat 
slowly and erratically since then to a level less than 2 points higher than 
they were at the beginning of the decade. Air Force afxjt scores have 
consistently averaged higher than the other services' and have not dis- 
played their tendency to plateau at mid-decade levels. 



Figure 2,3: Mean AFQT Scores, by 
Service: 1981-89 




1981 1962 1983 1984 1985 1988 1987 1988 1989 



— ~- ARMY 

mmmm NAVY 

■HMi AIR FORCE 

MKBH MARINE CORPS 

Note: AFQT scores were computed as the sum of standard scores on Arithmetic Reasoning and 
Mathematics Knowledge, plus the Verbal standard score times two. This is the formula used by DOD 
as of January 1, 1989. 

Source: Data are from the Defense Manpower Data Center. 

Figure 2,4 displays the service-wide mean scores on each of the four 
component subtests that make up apqt. For two of the subtests, Word 
Knowledge and Paragraph Comprehension, the pattern is quite similar, 
with the sharpest gains occurring by 1985, and little change thereafter, 

?3 



er|c 



Page 21 



GAO/PEMD-91-4 Military Technical-Training Effectiveness Is Unknown 



Chapter 2 

The Quality of MillUuy Recruits: 19S1-89 



Scores in Mathematics Knowledge and Ai'ithmetic Reasoning increased 
substantially between 1981 and 1984, Arithmetic Reasoning sco; ^s 
declined after that point, but scores in Mathematics Knowledge have 
continued to rise and were the only subtest scores to increase from fiscal 
year 1988 to fiscal year 1989, 



Figure 2.4: Mean AFQT Suw est Scores, 
1981-89 



55 




1931 



1902 



1983 



19S4 



19S5 



1986 



1987 



1988 



1989 



» ARrrW.REASONtNfG 
m.mmm WORD KNOWLEDGfc 
mcmm PARA. COMPREHENSION 
MATH KNOWLEDGE 

Source: Data are from tho Defense Manpower Data Center. 



Electronics Composite 
Scores 



The Electrjnics Composite score is defined by each service as the sum of 
four subtest scores: Arithmetic Reasoning, Mathematics Knowledge, 
Electronics Information, and General Science. Figure 2.5 displays the 
mean Electronics Compos,:te score for men and women from 1981 
through 1989. Figure 2,6 presents the same information by racial/ethnic 
grouping. 



24 



ERIC 



Page 22 



GAO/PEMD-SM MiUtary Technlcal-Trainlng Effectiveness Is Unknown 



Chapt^2 

The Quality of MlliUir>- R<:cniils: 1981^9 



Figure 2.5: Mean Electronics Composite 
Scores, by Genden 1^81-89 



215 



210 



205 



200 



190 



1981 1982 1983 1934 1985 198S 1987 1988 1969 

— MALE 

Note: Electronics Composite scores were comptited as tho sum of standard scores on Anlhmotic 
Reasoning, Mathematics Knowledge. Electronics Informotion. and General Scionco. 

Source: Data ore from tho Defense Manpower Data Center. 



:?5 



ERIC 



Page 23 



GA0/PEMD.91~1 5mitao-Techn!cal.Tralnlng EffecaveneM Is Unknowti 



Chapter 2 

The Quality of MUllarj' Recruits: l$81-89 



Figure 2.6: Mean Electronics Composite VIW^H^HH^HHHBHB 

Scores, by Race/Elhnicity: 1981-89 




1P81 1S82 1983 1984 19BS 1986 1987 1968 'iPSS 



— — WHITE 
«— dUCK 
wmmm HISPANIC 
■■■a OTHER 

Note: Electrontcs Composite scores were computed as tfto sum of r ^ndard scores on Aitihmotic 
Reasonlrtg, Mathematics Knowledge, Elccuonics Information, and General Science. 

Source: Data are from the Defense Manpower Data Center. 

Electronics Composite mean scores rose approximately 3-1/2 points 
between 1981 and 1989. They peaked in IGB^and experienced a gradual 
decline thereafter. Female recruits scored approximately 1 1 points 
lower than male recruits during this period. 

41 

Because of the overlap between the Electronics Composite and afqt, the 
racial differences are similar. In 1981, white recruits scored approxi- 
mately 24 points higher than minorities on this composite. By 1989, the 
gap had narrowed to approximately 19 points, but most of the.se gains 
by minorities were attained in the earlier part of the decado. By 1989. 
the scores of all racial groups were declining. 

The interservice pattern o. iilectronics Comi^oslte scores is again similar 
to the AFX}T patterns discussed previously. (See figure 2.7.) Army scores 
progressed from an average of ten points lower than the next closest 
sei-vice in 1981 to being essentially the same as Navy and Marine scores 



Page 24 5A0/PEMD.914 MUItary TechiUcal-Trainlng Effecaveness U UnknowTi 



Chapter 2 

The Quality of Military Recruits: 1981-89 



by 1986. Mean scores for these three services changed very hitle from 
1985 to 1988, but Army and Navy scores declined significantly in 1989, 
Air Force scores have remained higher than other services' but hpve 
fluctuated irregularly since 1984. 



Figure 2,7: Mean Electronics Composite 
Scores, by Service: 1981-89 



220 



215 




210 



205 



200 




— ARMY 
.... NAVY 
mmmm ALR FORCE 
■■■■ MARINE CORPS 

Note: Electronics Composite scores were computed as the sum of standard scores on ArithrretJc 
Reasoning, Mathematics Knowledge, Electronics Information, and General Science. 

Source: Data are from the Defense Manpower Dp:jsi Center. 

The trends during this period were not the same for all the subtests that 
comprise the Electronics Composite score. (See figure 2.8,) Scores in 
General Science and Mathematics Knowledge increased steadily over 
these years. Scores in Arithmetic Reasoning increased from 1981 to 
1983 but by 1986 had declined again and have since remained relatively 
constant. In 1981, recruits scored higher in Electronics Information than 
in the oth«u. component subtests, but by 1988 the scores were lower than 
for other subtests and lower even tlian they had been at the beginning of 
the decade. In 1989, they declined further. 



ERIC 



P7 

^ Page 25 GAO/PEMD-91-4 Military Technlcal-Training Effectiveness Is Unknown 



Chapter 2 

The Quality of Miiiiar>^ Recruits: 1981-89 



Figure 2.8: Mean Electronics Composite 
Subtest Scores, 1981-89 



55 Standard Scofts 




mmmmm ARfTH. REASONING 

GENERAL SCIENCE 
■MM ELECTRONICS INFO. 
■ ■■■ MATH KNOWLEDGE 

Source: Data are from the Defense Manpower Data Center. 



Number of Recruits 
Qualified for High 
Technology Specialties 



An alternative method for examining trends in recruit qualifications is 
to ^numerate the number of recruits whose asvab scores meet the min- 
imum standards required for entry into certain occupational specialties. 
Each service defines "cutting scores" for classifying recruits — that is, a 
minimum score on one or more asvab composites is required for entry 
into training for each specialty,^ This score can be adjusted to control 
flow into specialties as needed. We chose two of the more demanding 
specialties, both of them in the Air Force, and computed the number of 
recruits into each service from 1981 to 1989 whose asvab scores would 
have qualified them for technical training in these specialties. We chose 
these specialties as examples of high technology railitary occupations 
because they share cutting scores with a number of other technologi- 

Uy oriented specialties. Our purpose was not to imply either a surplus 
or deficit of requisite manpower. 



•Other qualifications may also apply — for example, possession of a valid driver's license, special 
physical qualifications, or the ability to obtain appropriate levels of security clearance. 

P8 

Q Page 26 GA0/PEMI>91-4 Military Technical-Training Effectiveness Is Unknown 



ERIC 



Chapter 2 

The Quailty of Military Recruiter 1981-89 



Figure 2.9 depicts thie number of recruits during the period in question 
who would have quaUfied for training as control and warning radar spe- 
cialists in the Air Force on the basis of their asvab scores.^ In 1981, 
approximately 38,000 recruits qualified for this specialty. By 1986, the 
number of recruits qualifying had risen to more than 69,000, but since 
then the number has declined to just under 58,000. In 1981, 87 percent 
of the recruits qualifying for training as control and warning radar spe- 
cialists were white males, although only about two thirds of 1981 
recruits were white males. These proportions had not changed substan- 
tially by 1989, when white males comprised 84 percent of qualified 
recruits but only 61 percent of the general recruit population. 



Figure 2.9: Number of Recruits Qualifying 
for Training as Control and Warning 
Radar Specialists, 1981-89 




1981 1fi62 10$3 1064 1M5 1986 1987 1986 1989 
YEAR 



*iVH[TEMALE 

Source: Data are from the Defense r.anpowef Data Center. 

Because the total manpower quotas for the services have varied over 
this period, we also computed the percent of all recruits within the 



^We used the cutting score that was current for Air Force recruits in May 1989 — an Electronics Com- 
posite score of 230. 



ERIC 



Page 27 GAO/PEMD-91-4 Military Technical-Training Effectiveness Is Unknown 



Chapter 2 

The QuaUty of Military Recruits; 1981-39 



gender and racial/ethnic groups who qualified for this specialty. The 
results are displayed in figure 2.10. 




— — WHfTEMALE 

«-« NONWHrTEMALE 

MMM WHITE FEMALE 

■■■■ NONWHrTE FEMALE 

Source: Data arc from the Defense Manpower Data Center. 

While nearly a third of white males who entered the services during this 
period qualified on the basis of their Electronics Composite scores for 
this occupational specialty, fewer than 15 percent of white females 
qualified. Fewer than 10 percent of minority males and approximately 3 
percent of minority females qualified. 

The demographic differences are even more sharply defined when the 
occupational specialty of Systems Repair Technician is examined. (See 
figures 2.11 and 2.12.) 



30 



Page 28 



GAO/PEMD-91-4 Military Technical'Tralnlng Effectiveness Is Unknown 



Chapter 2 

The QuaUty of MiUtary RecaiiU: 1981-89 




Source: Data are from the Defense Manpower Data Csnler. 



31 

Page 29 GAO/PEMD-91-4 Military Technicol'Traiiiing Effectiveness Is Unknown 



Chapter 2 

The Quality of Military Recruits: 1981-89 



Figure 2.12: Percent of Recruits ■■ 
Qualifying for Training as Systems 
Repair Technicians, 1981-89 

15.0 
13,5 

1^o 

105 
9.0 
7.5 
6.0 
AS 
3.0 
1.5 
0 

1981 1982 1983 1984 1965 1986 1987 1986 1989 

■ WHITE MALE 

OTHER 

Source: Data are from the Defense Manpower Data Center. 

In 1981, 16,563 recruits met the demanding qualifications for training in 
this field.-* The number of qualified recruits increased sharply by 1983, 
but by the ^-snd of the decade it had dropped to within 700 of its 1981 
level. The vast majority of these were white males, of whom approxi- 
mately 1 1 percent qualified. Fewer than 2 percent of our other demo- 
graphic groups met the qualifications. 




Summary and 
Conclusions 



As we approach the twenty-first century, the sophistication of our 
weapons systems can be expected to impose greater demands on the 
technological competence of the individual members of the armed 
forces, hi addition, the youth pool from which the services will draw 
their recruits will become increasingly female and minority. And 
although we cannot foresee how reduced political tensions may ease the 
demands on this pool, our examination of recruit quahty trends during 
the 1980's is not reassuring concerning the military's ability to meet 
these challenges. 



^This specialty requires .in ASVAU Electronics Composite score of 235 and a mecliJinical score of 247. 
requirements that rank it among ilie most challenging fields in all of the services. 



Page 30 



GAO/PEMD.91-4 MUitary Technical-Training Effectiveness Is Unknown 



Chapter 2 

The Quality of Military Recruits: 1981-89 



AFQT scores and, to a lesser extent, Electronics Composite scores are 
higher now than they were in 1981, yet both have begun to decline. The 
Electronics Information subtest scores are lower than they were in 1981, 
and General Science scores have dropped to near their 1981 level. Thus, 
fewer recruits are qualifying for the more demanding technical occupa- 
tional specialties. 

Women and minorities have traditionally scored lower in these areas. 
While the gap between white males and other recruits narrowed some- 
what in the early 1980's, since mid-decade the race and gender differ- 
ences have remained fairly constant. As we discussed in the previous 
chapter, women and minorities will form the bulk of the new-entry labor 
pool by the year 2000, and therefore providing well-trained personnel 
for a technologically sophisticated military can be expected to become 
increasingly difficult. The burden on training will increase, and with it 
will come the need to monitor the effectiveness of this training as recruit 
demographics shift. 

In the following chapters, we will address the services' current ability to 
measure the effectiveness of their training in technologically demanding 
areas. We will also examine the differences among gender and racial/ 
ethnic groupings, and the ability of the afqt and Electronics Composite 
scores to predict success in technical military specialties. 




Page 31 GAO/PEMD-91-4 Military Tec* lical-Trainlng Effectiveness Is Unknown 

^a3 



Chapter 3 



Classroom Measures of Traimng Effectiveness 



In this chapter, we address our second evaluation question: How useful 
are the data collected by the services before and during classroom 
training for selecting individuals for high technology roles and for eval- 
uating the effectiveness of this training? Although we reviewed a broad 
spectrum of evaluation-related materials and activities performed by 
the services at the classroom level, we concentrated on the course 
grades assigned at the end of training and, in some cases, at interme- 
diate r .ages during the training process. Our intention was to define the 
extent to which appropriate data were available to the services and to 
external reviewers from which some judgments could be made about 
training effectiveness. We did not attempt to perform an evaluation of 
individual curricula, C'raining sites, or instructors. 

Our primaiy criterion for selecting courses for review was that the qual- 
ifying score for course entry, as established by the service, was rela- 
tively high. In addition, we considered annual trainee throughput and 
the recent stability of the course curriculum. Nearly all the coui'ses 
which met our criteria were in the electronics area, and most involved 
the use, maintenance, and repair of electronic equipment, particularly 
radar or sonar. We collected the course grades associated with advanced 
individual training for 13 occupational specialties, four each in the Navy 
and Air Force, and five in the Army. Some of the data were collected at 
the training site, and some from centrally computerized records. 

Because of large differences between the services in annual throughput 
of trainees in these courses, the size of our sample varied widely across 
services. This variation was increased by problems we encountered con- 
cerning the usefulness of certain data provided by the Army (see the 
following section), as well as by our decision to supplement our already 
sizable Navy data base with relevant data previously collected by the 
Navy for research purposes. Our final san?^le consisted of more than 
6,000 sailors, neariy i.OOO Air Force persoitnel, and fewer than 300 
soldiers. In this chapter, we present the results of our analysis sepa- 
rately for each service. 

We examined the course data for their apparent reliability — that Is, for 
their apparent ability to discriminate meaningfully between perform- 
ances of trainees — as well as for differences in training outcomes among 
the demographic groupings discussed in the previous chapter. We also 
examined the relationship between training outcomes and individual 
abilities, as measured by asvab, in order to estimate the power of the 
selection criteria to predict perfonnance in training. 



ERIC 



Page 32 



GAO/PE5ri>-91-4 MUitary Technical-Trainlng Effectiveness Is Unknown 



Cnapterd 

Classroom Measures of 
Training ElTectiveness 



Army 



The Army specialties for which we collected data are listed in table 3,L 



Table 3.1: Army Occupational Specialties 
Reviewed 



Specialty 


Title 


LocaWon 


Electronics 
Composite 
qualifying 
score" 


24J 


Hawk pulse radar repairer 


Redstone Arsenal, Ala 


217 


27N 


Fofvvard area alerting 
radar repairer 


Redstone Arsenal, Ala. 


217 


29V 


Strategic microwave 
systems repairer 


Fori Gordon, Ga, 


217 


36L 


Transportable automatic 
systems operator 


Fort Gordon. Ga, 


217 


39B 


Automatic test equipment 


Fort Gordon, Ga. 


217 



operator 



^Sum of subtest standard scores 

We found that the course grades for these five specialties were not 
equally reliable indicators of performance during training. Whereas for 
the two classes at Redstone Arsenal final grades were a simple arith- 
metic average of intermediate measures of performance, at Fort Gordon 
we were unable to find a consistent relationship between individual 
milestone measures and final grades, nor were we able to locate anyone 
at Fort Gordon who could suggest one. We concluded that the grades 
recorded for two of these courses (36L and 39B) could not be used to 
discriminate reliably between the performances of individual trainees. 
We found inconsistencies in scoring procedures between different 
classes and even within the same class. Finally, we di*3C0vered that the 
Fort Gordon grades (unlike those at Redstone) were based partially on 
measures of physical conditioning that appeared to be unrelated to job 
performance. 



For a third training course at Ford Gordon (29V), however, we were able 
to generate what we judged to be reasonable measures of performance 
for some classes. For these clcsses, we developed an algorithm to pro- 
duce scores based only on those nonconstant measures that were related 
to general or ap'^^ied electronics training,* 



* External corroboration the prcferability of tliis improvised scoring procedure was provided by 
our later analysis of the relationship between grades and ASVAB Tlie correlation bawccn original 
29V grades and the Electronics Composite was negative and nor^significanL Tlie revised grades were 
positively (.50) and significantly correlated (p < .01) witli this ASVAB score. 



r^5 

Q Page 33 GAO/PEMD-914 Military Technical-Training Effectiveness Is Unknown 



ERIC 



Chapters 

Classroom Measures of 
Training Effectiveness 



Oui' final saniple wa^ therefore comp *d of U.S. Ai'my ti'ainees from 
those 24J and 27N classes conducted u iscal years 1985 thi'ough 1988 
whose records were available at the time of our visit, and approximately 
one third of the 29V trainees from the same period. Table 3.2 pi'esents 
the raean scores of rnis sample on afqt, the Electronics Composite of 
ASVAB, and course grades.'-^ 



Table 3.2: Mean Scores on Predictor and 
Criterion Variables, Army 







AFQT 


Electronics 
Composite 


Grade 


Category 




Number Mean" 


Number Mean* 


Number Mea*^ 


Male 






"~ 280"' 238^6 


232 89.23 


Female 




23 232^7 ' 


23 " 230.13 ~ 


' ~23 ^83^08 


White 




255 234.00 ~ 


255 240.00 


" 160" 90,l'9 


Nonwhite 




48 222.67 ' 


48" 226.29 


95 86.86 


ToTal 




3:;3 232.20 


303 237.83 


~ 255 88.95 



^Sum of subtest standard scores 



Male trainees in these courses scored significantly higher than did 
females, and white trainees performed better tiian minority students. 
These performance differences correspond to group-level differences in 
both AVQT and Electi'onics Composite scores for racial/ethmc gi'oupings. 

The group means presented in tiible 3.2 also suggest that avqt and Elec- 
tronics Composite scoi*es do not equally predict success in ti-aining, at 
least for fomales. While female trainees entered training with Elec- 
tronics Composite scores significantly lover than thcso of males, the 
AVQT scores of female and male trainees were equivalent. In other words, 
it would appear that Electronics Composite scores ai'o a better indication 
of future performance in these occupational specialties than are afqt 
scores. This is consistent with asvab's role in the military accession pro- 
cerus: potential reci-uib? are admitted to service on the biisis of afxjt 
scores, and then ai'e assigned to occupational specialties for which they 
qualify on the basis of their scoi'es on other asvab composites. 

We tested this hypothesis more directly by examining the congelations 
between course grades and three asvab scores: avqt, Elr.ctronics Com- 
posite, and a "factor score.'' This last measure is the weighted sum of all 
ten ASVAB subtests. We derived this last score by principal component 
analysis of asvab subtest snores. The results of our correlation analysis 
are displayed in table 3.3. 



*So<? appendix II for similar statistics on the coiirec level. 



^ Page 34 GAO/PEMD 91-4 Military Technical-Trainirg EfTectiveneiis Is Unknown 

ERIC 



Chapters 

Classroom Mi^urcs of 
Tktdning EfTcctivcncss 



Table 3.3: Intercorrelation of Study 



Variables, Army» FiectmnicQ Grade^ 



Categcry 


AFQV* 


Composite^ 


Factor^ 


Raw 


Adjusted' 


IbTal^ 












AFQT 


1.00 


0.819 


' o;84^~ " 


"0.299 * 


a419 


Electronics Composite 


303 


1.00 " 


0.899 


' 0'43'9 ' 


07599 


Factor 


303 


303 


1.00 


0,429' 




Grade 


"~ 189 


189 


189~ 


1.06 




Male" 












" AFQT 


KOO" 


" "0.83V 


6.859 


0.319 


0439 


Electronics Composite 


280 


"1.00 


a899 ■ 


"0.429 " 


0.589 


Factor 


'280 


280 


1.00 


0.419 




Grade 


171 


171 


171 


"TooT 




Female 












AFQT ' 


1.00 ' 


0.820 


0.879 ~ 


0.42 


0.539 


Electronics Composite 


23 


~ 1.00 


0.89 


0.35 


0.510 


Factor 


23 


23 


1.00 


0.35 




Grade 


18 


18 


l8 ' " 


T.OO " 




WhTte 












AFQT 


1,00 


0.809 


0.829 


0.249 ■ 


" 0.389 


Electronics Composite 


255 


KOO 


0.879 


0409 


0.609 


Factor 


255 


255' 


' 1.00 


0.409" ' 




Graoc 


154 


154 


~ 154 


l^OO 




Nonwhite 












AFQT 


roo 


0789 


0.859 


019 ' 


0.22 


Electronics Composite 


48 


KOO 


0.899 ' 


0.30 


040 


Factor 


48 


48 


1.00 


0.26 




Grade 


35 


35 


~35~' 


~100 





^rrcfation coefficients arc m upper diagonal and number m lower diagonal 
^AFOT = sum of subtest standard scores 

^Electronics Composite ^ sum of subtost standard scores for Electronics Composite 

^Factor score from first factor from pimcipal component analysis 

^Grade = final course grade 

'Adjusted = correlation adjusted for restriction of range 

9p < .05 

For our whole Army sample, the variation within Electronics Composite 
scores explains approximately 18 percent of the variation within course 



ERIC 



Page 35 



GAO/PEMD-91-4 MiUtary Technical'Trainlng Effectiveness Is Unknown 

r^7 



Chapters 

Classroom Measures of 
Training EfTectivencss 



grades, more than factor scores and substantially more lh?n afqt.^ In 
most cases, Electronics Composite scores are somewhat bt'tler predictors 
of grades than are afxjt scores, whether a simple cow elation coefficient 
or a coefficient adjusted for range restriction is used as a criterion.'* This 
is not tme, however, for female soldiers, for whom avqt predicts ci<iss- 
room performance better than the Electronics Composite does. In most 
cases, asvaB factor scores provide stronger predictions than either avqt 
or the lilectronics Composite. Our ability to predict course grades from 
any of the three asvaij scores is weakest for minority soldiers as a group. 

Our analysis of nonwhite and female soldiers is unfortunately based on 
a relatively small sample. Nevertheless, it suggests that avqt or some 
other general score from asvab may provide a better predictor of success 
for women recruits in electronics-related training than does the Elec- 
tron'''5 Composite score. It also indicates that we need better predictors 
than we currently have for minority students. 



Navy 



ERLC 



We examined four Navy training courses, two eaci from the Antisub- 
marine Warfare School in San Diego and the Naval Air Station in 
Millington, Tennessee, They are listed in table 3.4, 



^A correlation coefficient is tliC square root of common variance. In tliis case, the Electronics Com* 
positc score from ASVAI) shares 18.5 percent (.43-) of variance with grades, or, after ac^iuslment, 35 
percent (.59-). 

^Tl»e acUustment for restriction in range is common among psychometricians and appears in all DOD 
reports that we reviewed. Since correlations arc simply measures of tlie extent to whici. two mea- 
sures vary In conunon, any restriction to U»e variation of one of Uie measures results in an underesii- 
malc of their common variation. Tliis restriction occurs when the sample includes only one end of a 
spectrum of scores, as is tlie case for any measure used for selection purposes. Our sample includes 
onW Uiose wnose AFXJT scores were sufficiently higli to permit acceptance into military scr\'ice, Tlie 
a4)asted conrelatlor coefficient represents tlie hypothetical relationship octween the ASVAB measure 
and course grades If this range restriction did not exist for our sample. 



Q Fdge 36 GAO/PEMD-91-4 MUltary Technical-Training Effectiveness Is Unkno^vn 



3L8 



Chapter 3 

Classroom Measures of 
Tmining Effectiveness 



Table 9,4: Occupation&l Specialties 
Reviewed, Navy 



Location 



Specialty Title 

STG Sonar technician. 

antisubmarine warfare, 
surface 



San Diego. Calif 



STS 



AQ 



Soi ar technician, 
antisubmarine warfare, 
subsurface 

Aviation lire control 
technician 



San Diego. Calif 



Millington. Tenn 



AX 



Aviation antisubmarine 
warfare technician 



Millington. Tenn 



Electronics 
Composite 
qualifying 
score* 



218 

2r8 



^Sum of subtest standard scores 



We were able to achieve a much larger sample si7>e (6,156) for these 
courses than was the case for our Army courses (303) because of tlieir 
larger annual throughput, and because the Naval Personnel Research 
and Development Cv^iiter provided us with relevant data that tlie; had 
collected on STS and STG specialties for fiscal years 1986 and I9b7. 
These data supplemented the fiscal year 1988 and fiscal year 1989 data 
that we collected at the San Diego base, Millington provided us with 
training data for 1987 and 1988, Table 3,5 presents the mean scores on 
the two ASVAB composites and course grades for the entire Navy sample. 
Statistics on individual courses are presented in append ' II, 



Table 3.5: Mean Scores on Predictor and ■^■^^■■^■■■■■^■■■^■■■■■^^^H^^^^HBBHHi 

Criterion Variables, Navy ^ Electronics 

AFQT Composite Grade 



Category 


Number 


Mean* 


Number 


Mean** 


Number 


Mean 


Male 


6.080 


229.60 


6.080 


235.33 


5.882 


~"89Tl 


Female 


76 


235.59 


76 


230.66 


71 


9070 


White 


5.355 


" 230,49 


5.355 


236.25 


5,179" 


89.21 


Nonwhite 


801 


224 !8 


8or 


22875 


" 1.159 


89.58 


Total 


6,1G6 


229,67 


6,156 


235,28 


6,4"43 


69,30 



*Sum of subtest standard s^ res 



Male recruits entered training with significantly lower af'Qt scores and 
significantly higher Electronics Composite scores than those fu* females. 
Final grades for males were slightly, but significantly, lower than those 
for their female classmates. These results suggest *:hat, at least for 
females, a substantial advantage in apq*^ can overcome a disadvantage 
in the Electronics Composite, In addi^ minority students began 



^ Page 37 GAO/PiffiVID-91-4 Military Technlcal-Tralning Effectiveness Is Unknown 



ERIC 



39 



Chapter 3 

Classroom Measures of 
Training EfTecUvcncss 



training with substantially lower scores than nonminorities on both Mxrr 
and the Electronics Composite, The final grades of the two groups were 
not significantly different 

The results of our correlation analysis appear in table 3,6 They suggest 
that AFX}T may be more important for training success than the Elec- 
tronics Composite, For most Navy groupings, afqt scores are better 
predictors of classroom performance than are Electronics Composite 
scoi'es. When adtjustod, they explain from 12 to 38 percent of the varia- 
tion in course grades. Once again, the Electronics Composite is the 
weakest of the th^ee predictors for fcnale sailo'^s, and the more general 
factor scoi'e is the strongest. The ability of any of the three asvab scores 
to pi'edict training success is weakest for minorities. 



40 



Page 38 



GAO/PEMD-91-^ Military Technical-Training EffccUvcncas U Unknown 



Chapters 

Classroom Measures of 
Training Effectiveness 



Table 3.6: Intercorrelation of Study ■mn^^^^^^^B^Hi^^B^^^^HHII^H^^B^H^^^^^^H 

Variables, Navy* Electronics Grade^ 

Category AFQT** Composite^ Factor** Raw Adjusted' 



Total 


AFQT 


1.00 


0.799 


0.8C9 


0.309 


0.469 


Electronics Composite 


6.156 


1.00 


0.85S 


0.279 


0.469 


Factor 


6.156 


6.156 


1.00 


0.289 




Grade 


5.939 


5.939 


5.939 


1.00 




Male 


AFQT 


1.00 


0.799 


0.819 


0.309 


0.469 


Electronic Composite 


6.080 


1.00 


0.859 


0.279 


0.469 


Factor 


6.080 


6.080 


1.00 


0.279 




Grade 


5.868 


5.868 


5.868 


1.00 




Female 


AFQT 


1.00 


0.749 


0.819 


0.399 


0.629 


Electronics Composite 


76 


1.00 


0.829 


0.329 


0.559 


Factor 


76 


76 


1.00 


0.399 




Grade 


71 


71 


71 


1,00 




White 


AFQT 


1.00 


0.799 


0.819 


0.309 


0.479 


Electronics Composite 


5.355 


1,00 


0.859 


0.299 


0.509 


Factor 


5.355 


5.355 


1.00 


0.309 




Grade 


5.165 


5.165 


5.165 


1.00 




Nonwhite 


AFQT 


1.00 


0.749 


0.779 


0.229 


0.349 


Electronics Composite- 


801 


1.00 


0.819 


0.149 


0.259 


Factor 


801 


801 


1.00 


0.119 




Grade 


774 


774 


774 


1,00 





^Correlation coefficients are in upper diagonal and number in lower diagonal. 
^AFQT = sum of subtest standard scores 



^^Electronics Composite = sum of subtest standard scores for Electronics Composite 

^Factor = score from firsl factor from principal component analysis* 

*Grade = final course grade 

'Adjusted = correlation adjusted for restriction of range 

9p< 05 



Air Force ^^"^ Force training courses we reviewed are L .ted in table 3.7. 

Our sample size from these courses totaled 922. Statistics for individual 
courses are provided in appendix II. (Wfi received both training and 



> 

ERLC 



41 

Page 39 GAO /P»'.MD-914 Military Technical-Training Effectiveness Is Unknown 



Chapter 3 

Classroom Measures of 
Training Effectiveness 



demographic data on all of these courses from the Air Force Human 
Resources Laboratory.) 



Table 3.7: Occupational Specialties ■■■■■■■■■■■■□■■■■■■■■■■■■■■^^ 

Reviewed, Air Force Electronics 

Composite 
qualifying 

Specialty Title Location score^ 

30332 Aircraft control and Keesler AFB. Miss. 230 

warning radar specialist 

30333 Automatic tracking radar Keesler AFB. Miss. 225 

specialist 

45530A Photo-sensors Lowry AFSTCola 225 

maintenance specialist, 
tactical reconnaissance 
sensors 

45530B Photo-sensors Lowry AFB. Colo. 2?5 

maintenance specialist, 
reconnaissance electro- 
ootical sensors 



^Sum of subtest standard scores 



Trainees' asvab scores and course grades are displayed in ta' le 3.8. As 
v/ould be expected, asvab scores for Air Force students are significantly 
higher than those for the other services we reviewed. In addition, we 
found a higher proportion of female trainees 'u the Air Force courses 
than in the Army and Navy courses we reviewed. 



Table 3.8: Mean Scores on Predictor ar>d 
Criterion Variables, Air Force 





AFQT 


Electronics 
Composite 


Grade 


Category 


Number Mean" 


Number Mean" 


Number Mean 


Male 


824 235.45 


824 241.94 


854 91.31 


Female 


98 237.73 


98 235.88 


100 89.91 


White 


825 236.22 


825 241.95 


855 91.21 


Nonwhite 


97 231.19 


97 235.73 


99 90.76 


Total 


922 235.69 


922 241.30 


954 91.16 



^Sum of subtest standard scores 



Male Air Force reciiiits entered training with substantially higher Elec- 
tronics Composite scores and slightly, but significantly, lower afqt 
scores than did female recruits. Despite the slight female afqt advan- 
tage, male recruits ended training with higher course grades than those 
earned by female rec its. In addition, although white students began 
training with substantially higher asvab scores, their final grades were 
not significantly different from those of their nonwhite classmates. 



42 

r-n^/^- Page 40 GAO/PEMD-91-4 Military Technical-Training Effectivenes; Is Unknown 



Chapter 3 

Classroom Measures of 
Training Effectiveness 



As table 3,9 demonstrates, the correlations between asvab and Air Force 
training grades followed much the same pattern as did the Navy's. When 
correlations are ac^justed, the traditional asvab composite scores explain 
from 6 to 36 percent of classroom performance. Factor scores are as 
good as, or better than, composites as predictors. For female students, 
AFXJT scores outpredict Electronics Composite scores. Once again, it is 
most difficult to predict course grades for minority students, although 
factor scores explained 10 percent of their classroom performance. 



43 



r ^ 

FRIG 




Page 41 


GAO/PEMD-91-4 Military Technical-Training Effectiveness Is Unknown 





Chapters 

Classroom Measures of 
Training Effectiveness 



Table 3.9: Intercorrelation of Study 
Variables, Air Force' 







Electronics 




Grade* 


Category 




Composite^ Factor^ 


Raw 


Adjusted^ 


Total 


AFQT 


l.OC 


0.719 


0.759 


0.299 


0.449 


Electronics Composite 


922 


1.00 


0.849 


0.339 


0.549 


Factor 


922 


922 


1.00 


0.35^^ 




Grade 


922 


922 


922 


1.00 




Male 


AFQT 


VOO 


0.749 


0.779 


0.309 


0.449 


Electronics Composite 


824 


1.00 


0.849 


0.335 


0.549 


Factor 


824 


824 


1.00 


0.349 




Grade 


824 


824 


824 


1.00 




Female 


AFQT 


1.00 


0.689 


0.779 


0.359 


0.549 


Electronics Composite 


98 


1.00 


0.779 


0.269 


0.509 


Factor 


98 


98 


1.00 


0.289 




Grade 


98 


98 


98 


1.00 




White 


AFQT 


1.00 


0.729 


0759 


0.319 


0479 


Electronics Composite 


825 


1.00 


0.839 


0.359 


0 589 


Factor 


825 


825 


100 


0.359 




Grade 


825 


825 


825 


1.00 




Nonwhite 












AFQT 


1.00 


0.659 


0.689 


0.19 


r.249 


Electionics Composite 


97 


1.00 


0.829 


0.239 


0.339 


Factor 


97 


97 


1.00 


0.319 




Grade 


97 


97 


97 


1.00 





^CorrelatiCT coefficients are in upper diago a* and number in lower diagonal. 
^AFQT = sum of subtest standard scoiu . 

^Electronics Composite = sum of subtest standard scores for Electronics Composite 

*^Faclor = score from first factor from principal component analysis 

®Grade = final course grade 

'Adjusted - correlation adjusted for restriction of range 

9p < .05 



Suirimary and 
Conclusions 



Our review of advanced individual training courses — designed to pre- 
pare ref*r''?<s in three services to serve in certain **high technology" 
roles — identified some problems with the utility of data maintained by 
the Army on classroom performance in certain specialties. It would not 

44 



ERLC 



Page 42 



GA0/PEMD.914 MiUtary Technical-Tralnlng Effectiveness Is Unknown 



Chapter 3 

Classroom Measures of 
Training Effectiveness 



be appropriate to make intei'service comparisons on the basis of this 
finding, however, since much of the Navy ti'aining information and all of 
the data we received fi'om the Air Force were specially prepared for 
I'esearch purposes. Wc cannot therefore make firm judgments about the 
immediate availability of psychometrically suitable measures from these 
two services. 

The psychometric deficiencies we found at Fort Gordon appeared to 
result from a number of diffei'ent factors, including questionable data 
entry procedui'es and softv/ai'e. They are also a function of the pass/fail 
nature of the ciiteria used to evaluate student progress. We cannot 
assess the extent to which pei'formance on individual training tasks is 
susceptible to moi'e sophisticated measui'es than **go/no-go," but we 
would suggest that subject matter expei'ts attempt to develop more 
finely tuned, objective, and I'eliable measui'es of performance. 

Our review also raised ceitain questions about diffei'ential success in 
training for males and females, and for whites and minorities, and about 
the differential predictive validity of asvab for these subgi'oups. Our 
analysis of gender- and race-i'elated differences in mean asvab scores 
and coui'se grades in Khe Avmy suggested that the Electronics Composite 
was an efficient simple predictor of training success. Women and minor- 
ities entered training w^ith significantly lower Electronic.*- Composite 
scores and received significantly lower course gi'ades. 

Our findings from the Navy and Air Force samples, however, suggest 
that a more complex relationship exists between asvab and coui'se 
grades. For these services, gender- and race-related differences in course 
grades wei'e small or nonexistent, despite significant differences in Elec- 
tronics Composite scoi'es. The Navy and Air Force samples also differed 
fi'om the Aimy sample in three other respects: (1) Electronics course 
grade differences, though significant, were much smaller in the Na^^y 
and Air Force than in the Army; (2) unlike women soldiers. Navy and 
Air Foi'ce women had significantly higher afqt scores than their male 
classmates; and (3) the avqt disadvantage for minorities in the Navy 
and Air Foi'ce was only half of that in the Army. These findings suggest 
that an advantage in the more genei'al aptitude measured by afqt (or by 
an even more genei'al measui'e such as a factor score) can compensate 
for a deficit in the Electronics Composite when the deficit is not too 
great. In other woi'ds, success in training may be related as much to gen- 
ei'al ability as to peiformance on the Electi'onics Composite. 



45 

O Page 43 GAO/PEMD-91-4 Military Technical-Training Effectiveness Is Unknown 

ERIC 



Chrpt€r3 

Classroom Measures of 
Training Effectiveness 



This interpretation is consistent with the results of our correlation anal- 
yses, which tested the relationship between asvab scores and course 
grades more directly. While asvab s Electronics Composite score demon- 
strated a moderate ability to predict success in training for white male 
students, it was less successful for female or minority :5tudents. The 
factor score we derived from asvab was in most cases the best simple 
predictor of training success because it utilized information from all len 
ASVAB subtests, and not simply from the subset used for afqt or the Elec- 
tronics Composite. However, all three asvab measures (afqt. Electronics, 
and factor scores) in most cases proved to be relatively weak predictors 
of performance in training for minority students. 

Correlations do not imply causality, nor doe<? the lack of a correlation 
for a subsample indicate the location of a problem. From our analyses it 
is impossible to conclude either that asvab is a weaker measure of ability 
for some groups, or that some factor in classroom training contributes 
differentially to the success of different groups. Yet, as the youth pool 
shrinks and its demographic characteristics shift, the military will find 
itself turning more toward minority and female recruits. These groups, 
as we have seen, consistently score lower in the measures used to assign 
recruits to technical training and in our largest service are less likely to 
perform well. It will become increasingly incumbent on all services to 
optimize selection criteria for technical advanced individual training for 
women and minority groups, to provide compensatory training where 
needed, and to assure that no extraneous factors within the training 
environment interfere with the full development of a recruit*s potential. 



46 




Page 44 GA0/PEMD-9I-4 Military Ter' .ical-Training E..Tectiveness Is Unknown 



Chapter 4 

Field Measures of Training Effectiveness 



Whatever criteria may exist to predict or to assess a recruit's pe>'form- 
ance in training, the ultimate criterion of training effectiveness is the 
recruit's performance on the job. Our third evaluation question 
addresses this issue: How \vell do the services* selection criteria and 
training evaluation measures predict success in high technology roles? 

To ans\ver this question, we attempted to locate individual field-per- 
formance data routinely collected by the services that could be linked to 
our ASVAB and classroom training data to serve as reliable and valid 
indicators of training effectiveness. And, although v/e were made aware 
of numerous post-training evaluation activities performed by the indi- 
vidual services, only the Army could provide us with individual per- 
formance measures. In this chapter, we will examine the quantitative 
relationship b^*^ween these Army data and the other information we 
compiled. We will also discuss other evaluation mechanisms used by the 
services and suggest a potential alternative source of post-training eval- 
uation measures. 



Army 

Skill Qualification Test Army regulation, a soldier's occupational specialty performance is 

tested within six months of completion of training and every year there- 
after. These written tests are prepared by the sponsoring training site. 
They are administered under the direction of the Skill Qualification Test 
(SQT) directorate at Fort Eustis, Virginia, where the resulting data are 
stored. 

Fort Eustis provided us with the SQT scores of all soldiers who cook the 
SQT from 1985 to 1988 in the occupational specialties we had chosen for 
our sample. Summary statistics for these data are provided in appendix 
IV. We matched these scores, where possible, with asvab scores and 
classroom grades for each soldier included in our training site review,* 
Table 4.1 presents the scores of these soldiers summarized by demo- 
graphic groups, together with the correlation coefficient estimating the 
relationship between SQT and the measures we examined in the previous 
chapter. 



'For soldiers with multiple SQT scores during this period, we used only the first score. 

47 

Page 45 GAO/PEMD-914 Military Technlcal«Training Effectiveness Is Unknown 



Chapter 4 

Field Measures of Training Effectiveness 



Table 4.1: Correlation of SQT and 
Predictor Variables 











Correlation with SQT 




Category 


Mean 


Number 


AFQT* 


Electronics 
Composite^ 


Factor* 


Grade^ 


Male 


82.12 


209 










Raw 






0.21^ 


0.28' 


0.36^ 


1^47' 


Adjusted® 






0.30* 


0 41' 






Female 


77.52 


21 










Raw 






-0.07 


0.12 


-0.03 


-0.52' 


Ad'usted^ 






-OAO 


0 19 






WhiFe ~^ 


81.86 


144 










Rav/ 






0.21' 


0.25' 


0.32' 


0.44' 


Adjusted** 






0.33' 


0.40' 






Nonwhite 


81.45 


86 










Raw 






-0J9 


0.07 


0.12 


044' 


Adjusted** 






-0.22 


0.10 






Total 


81 70 


230 










Raw 






0.18' 


0.28' 


0.34' 


0.43' 


Adjusted** 






0.26' 


0.41' 







WOT = sum of subtest standard scores 

^Electronics Composite = surr of subtest standard scores foi electronics Composite 

^Factor = score from first factor from principal component analysis 

^Grade = final course grade 

^Adjusted = adjusted for restriction of range 

'p< 05 

For the total universe of soldiers the best simple predictor of SQT scores 
is final classroom grades, which explains 18.5 percent of the variation in 
sqt's. The AFXJT and Electronics scores from asvab scores were also sig- 
nificantly related to sqt's for white males in our sample, but factor 
scores consistently outpredicted these composites. For females and for 
nonwhite soldiers, however, asvab scores were not positively related to 
future performance as measured by SQT. Most surprisingly, the grades 
scored by female students at the training site were inversely correlated 
v/ith their SQT scores — that is, women with higher grades tended to 
score lower on sqt's, and vice versa. 

The limited size of our sample, especially for female soldiers, makes it 
inappropriate to generalize without severe caveats. However, our anal- 
ysis suggests that the traditional asvab scores may not be the best pre- 
dictor of performance for the nontraditional — that is, the female or 
minority — soldier. This finding reinforces the concern we expressed in 

48 



ERIC 



Page 46 



GAO/PEMD-914 Military Technical-Training Effectiveness Is Unknown 



Chapter 4 

Field Measures of Training Effectiveness 



the last chapter, that better predictors of success for these groups 
should be found. Any interpretation of the inverse relationship between 
grades and sqt's for women would be purely speculative, but this 
anomaly warrants further investigation. 



Each Army training site includes an evaluation unit that performs reg- 
ular process evaluations. These include classroom observations of 
instructors, annual meetings to review curricula, cyclical outreach pro- 
grains to contact graduates of the school in the field and their supervi- 
sors, and occasional more intensive curriculum reviews called training 
effectiveness analyses. 

Classroom observations are conducted on a regular basis by both master 
trainers and the training site internal evaluation unit. They are per- 
formed more frequently when instructors are new or have received less- 
than-satisfactory evaluations. Most of the observation reports that we 
reviewed, particularly those performed by the internal evaluation unit, 
were mainly concerned with administrative details. The most frequent 
criticism we encountered was that copies of the lesson plan and curric- 
ulum materials were not properly arranged and situated at an empty 
desk in the rear of the classroom for the observer. 

Schoolhouse external evaluation units also conduct outreach programs 
during which membei's of the units travel to Army bases--where a large 
concentration of the training-site graduates are stationed — to collect 
information on the opinions of base staff about training quality. These 
reviews occur approximately every two or three years for the courses 
we reviewed, but they are not routinely scheduled. They are more fre- 
quently occiisioned by indications from the field of training problems, 
and their frequency is also affected by travel-budget considerations. 

More objective and formal training effectiveness analyses are performed 
when a new training course is introduced or when weapons system mod- 
ifications prompt major changes in the curriculum. These analyses 
include written tests, hands-on tests, and interviews with soldiers and 
their supervisors. The most recent training effectiveness analysis for the 
coui'ses we reviewed was conducted during the summer of 1987 and was 
prompted by changes to the Hawk missile system. 



Other Evaluation-Related 
Activities 



Page 47 



49 

GAO/PEMD-914 Military Technlcal«Tralning Effectiveness Is Unknown 



Chapter 4 

Field Measures ef Ttaining Effectiveness 



Navy 



Source^i of Individual Field considered two possible sources of fielc'. performance information 
Performance Data routinely collected by the Navy as measunis of the effectiveness of tiie 

training courses in our sample: Level II sunws and Advancement in 
Rating Examinations, The Level II sui*vey program was designed to col- 
lect information on the job performance of recent training-school gradu- 
ates.2 For each course, questionnaires were sent to the supervisors of 
graduates approximately six months after graduation, asking them to 
rate individual tasks performed within the specialty (as to their impor- 
tance) and the adequacy of the level of training demonstrated by the 
course graduates. We found, however, that Level II surveys have been 
effectively abandoned by the Navy, and that none has been performed 
since at least 1986. 



Advancement in Rating Examinations are multiple-choice tests adminis- 
tered to candidates for promotion v/ho have already been certified as 
qualified by their commanding officers. Different tests are prepared for 
each promotion cycle, and their results are used to rank candidates. 
Because they are not standardized, and are not administered to all grad- 
uates, these tests, in the judgment of test developers and administrators, 
are "not a good source of training evaluation feedback." We concurred 
with this judgment. 



Internal Review of 1986, the Chief of Naval Operations requested that the Naval 

Evaluation Practices Training Systems Center (ntsc) determine the current status of Navy 

training evaluation and provide recommendations for the future conduct 
of such operations, ntsc submitted three reports to the Chief of Naval 
Technical Training in 1988. They identified three central evaluation 
functions: Level II surveys, the Flee"" ^'raining Assessment Program 
(HETAP), and the Training Assessmeiw Survey Team (tast) The T/VST 
concept had only recently been established at the time of the ntsc 
report, and only two surveys had been completed under the program. 
These surveys were limited to new weapons systems imd involved fleet 
vk r> to identify training deficiencies and requirements and any correc- 
tive actions that needed to be takeu. 



^Thc term derives from a classification of evaluation intensivencss established in 1981 by the Naval 
Education Training Command. Level I refers to unsolicited feedback to training sites conaming 
training adequacy, Uvel 11 to a questionnaire sent to the fleet, and Level III to an in-deptli analysis of 
problems identified in lower level reviews. CT n 



ERLC 



Page 48 



GA0/PEMl>91-4 Military Technical-Training Effectiveness Is Unknown 



Chapter 4 

Field Measures of Training EfTectiveness 



FLETAP is currently a reactive system that attempts to identify training 
deficiencies through either direct input from the fleet or review of 
reports and other fleet materials, n.ETAP is also responsible for per- 
forming Training Quality Reviews, which involve administering job per- 
formance tests to fleet personnel to measure adequacy of training. No 
such reviews have been completed. The fletaP component responsible 
for the Pacific Fleet consists of five full-time staff positions, four of 
which were filled at the time of our visit there. Its Atlantic Fleet coun- 
terpart has four authorized staff positions, three of which were filled. 

The NTSC report also identified numerous other nonformal or noncentral- 
ized evaluation and evaluation-related activities within the Navy's 
training community. However, ntsc found that the quality of current 
Navy classroom training cannot be readily ascertained for the vast 
majority of courses; that there is a general lack of technical evaluation/ 
assessment skills; that current evaluation activities are fractionated, not 
comprehensive, and operating in an environment of obsolete instruc- 
tions and unclear objectives, NTSC concluded that the fleet's mandate to 
provide useful data to the i'.raining community about the performance of 
its graduates needed to be enforced and that fleet evaluation activities 
should be upgraded and appropriately staffed. It also recommended that 
internal training appraisal responsibility be decentralized to the training 
site level and that independent external programs be reviewed for tech- 
nical adequacy and integrated into an overall systematic approach. 

In response to these reports, a three-person team has recently been 
established at the headquarters of the Chief of Naval Education and 
Training to review the NTSC proposals and recommend an integrated 
training appraisal program. No firm timetable has yet been established 
for the team'j report, but they anticipate providing a proposal in the 
summer of 1990. We welcome this Navy effort, but we question whether 
this response will prove adequate in view of tne severity and extensive- 
ness of the problems ntsc has documented. 



ERIC 



51 

- Page 49 GAO/PEMD-91-4 Military Technlcal-Trainlng Effectivene*is Is Unknown 



Chapter 4 

Field Measvjrcs of Training Effectiveness 



Air Force 



Sources of Individual Field ^^'^ considered sources of individual-level data for field performance of 
Performance Data ^^^^^ personnel equivalent to those we considered for the Navy — 

that is, promotion examinations and supervisory surveys. After inter- 
viewing Air Force pei'sonnel, however, we concluded that neither was 
appropriate for our pr iposes. 

Unlike the Navy's Level II surveys, the Air Force supervisory surveys 
are still in use. Th ' are conducted by the training sites' evaluation 
units for each tra, .ng course at 2- to 3-year intervals, Questionnaii'es 
are sent to the supervisoi'S of recent training graduates to determine 
how frequently they perform each of the msyor tasks for which they 
were trained, and how well they perform them, A summary training 
evaluation report is produced from these data identifying task-specific 
training deficiencies and/or unnecessary training. We were informed 
that tl' individual-level data collected by these surveys are not .main- 
tained by the training sites after their reports have been pi'epared. 
Therefore, no individual data exist that would allow us to perform anal- 
yses equivalent to those we performed using the Army SQT data. 



Other Evaluation-Related Other training asbessment procedures exist, including training quality 
Activities reports, utilization and training workshops, and occupational survey 

reports- Training quality reports provide a means for supervisors of 
recent training-site graduates to report apparent deficiencies in a 
recruit's training. Like the Navy's fletap activities, these report'^ are 
part of a reactive evaluation process, A succession of training quality 
reports for a given course can V^^d to a complete course review. The 
other activities are more concerned with front-end analysis, Ocrp^;.a- 
tional survey reports on occupational specialties are prepared approxi- 
mately every three to four years. They are based on questionnaires 
designed to define the m^or tasks performed by specialists and their 
relative frequency. Utilization and training workshops are held when 
the job requirements of an old occupational specialty change dramati- 
cally or when a new specialty is define* ^tajor comm?>nd functional 
officers, training staff officers, and manage, s at the Air Force technical 
schools participate by examining data from occupational survey repoi'ts 
and identifying the specific training requirements of the specialty. 



52 



ERIC 



Page 50 



GAO/PEMI)-91-4 Military Technical-Training Effectiveness Is Unknovn 



Chapter 4 

Field Measures of Training EfTcctiVeness 



A key impediment to establishing a field evaluation component of 
training assessment is the expense of developing, testing, and adminis- 
tering measures that validly and reliably measure actual performance. 
Since the early 1980's, a miyor uifort to address these measurement 
issues has been under way under the dire :tion of the Office of Accession 
Policy of the Office oi the Assistant Secretary of Defense for Force Man- 
agement and Personnel, Known as the Joint-Sei^vice Job Performance 
Measurement (jpm) project, the effort was initiated at the request of the 
Congress to validate asvab measures against actual performance in the 
field — instead of against training grades, which had been the so* crite- 
rion. The project was triggered by the discovery of the asvab mis- 
norming in the late I970*s, whioh unintentionally allowed yome 300,u00 
less qualified recruits into the services and resulted in field com- 
mandei-s' complaints . of quality deterioration among their pei'sonnel, jpn 
in other words, was directed toward testing the connection between th 
first and third points in our model: test data collected for selection ana 
classification purposes at recruitment, and field performance data, jpm 
did not sn out to establish a link between classroom performance and 
field performance, 

j!>M concluded that suitable measures of field performance did not exist, 
and undertook to develop them. Over several years, some highly reliable 
hands-on performance tests were developed and administered for 25 
occupational specialties across the four services. Surrogates for hands- 
on testing were also developed, including more traditional job-knowl- 
edge tests and performance ratings, jpm concluded that apxjt reliably 
predicted differences in levels of actual field performance, and that 
these differences tended to pei-sist through a recruit's enlistment, jpm, 
however, has not reported any analyses of sex- or raco-related differ- 
ences. Because of its asvab orientation, the project als"^ has not 
addressed the issue of the classroom/field-performance connection, 

JPM performance measures were expensive to develop and frequently 
costly to administer, and they therefore may not be suitable for more 
routine use as measures o:r training effectiveness. However, the invest- 
ment made to develop these measures and their surrogc^.es could prove 
n^ore profitable if some of the measures developed and the lessons 
learned in the jpm effort were more widely applied to the development 
of realistic assessment procedures for training. 



S3 

O Page SI GA0/PEAlI>9MMUit^- Technical-Training Effectiveness b Unknown 




Alternative Data 
Sources: The Job 
Performance 
Measurement Project 



Chapter 

Field Measives ofTiuinlng EfTcctivencss 



Summary and 
Conclusions 



Our third evaluation question asked to what extent the services' selec- 
tion criteria and training evaluation ineasures predict success in high 
technology )'oles. While we identified a multitude f . evaluation-related 
activities in the three services, we nevertheless concluded chat irsuffi- 
cient data existed for us to respond to this question. Army SQT data can 
be adapted for this purpose, tut neither the Navy nor the Air Force rou- 
tinely collects and aiaintains field performance data to evaluate indi- 
vidual-level training effectiveness. 



Our analysis of Army SQT data was hindered by the limited size of the 
sample. We were ablr to derive some preliminary conclusions, how- 
ever — namely, that classroom performance, as measureo by SQT, is a 
moderately strong indicator of future field performance for males, but 
not for females, and that aiaab can predict sqt's moderately well for 
white male recruits, but is apparently unrelated to sqt scores at* ieved 
by women and minorities. These iSSVMI/sqt iindings a*^ consistent witli 
the pattern of AsvAB/course-grade relationships we discussed in tlie pre- 
vious chapter. 

The lack of other objective, systematically collected field evaluation 
data rendei^ meaningful evaluation of crain? ig effectiveness impossible. 
Decisionmaker — whether they are in the Congress, dod, or the indi- 
vidual services — can only react to problems in the field after they liave 
become apparent and have been iderttifled as training- relatiid. However, 
given the cost and complexity of today's military eo Mment, it is imper- 
ative that the services possess adequate evaluative u tta to monitor how 
well pei'sonnel are being prepared to use and maintain these weapons. 



54 



ERIC 



Page 52 



GAO/?EiiID^9M miuury TechnlcalTrainlng Effectlvenese In Unknown 



Chapter 5 . . 

Summary, Recommendations, and Agency 
Comments and Our Response 



Surnmary ^^^^ report has addressed three evaluation questions: 

• How has the aptitude of recruits for technologically sophisticated spe- 
cialties changed since 1980? 

• How useful are the data collected by the services before and during 
classroom training for selecting individuals for high technology roles 
and for evaluating the effectiveness of this training? 

• How well do the services' selection criteria and training evaluation mea- 
sures predict success in high technology roles? 

To respond to these questions, we examined the three essential types of 
information that could be used to assess the effectiveness of military 
training: (1) data collected at entry to the military for selection and 
assignment to an occupational specialty, (2) data on classroom measures 
of performance during formal training, and (3) data on individual field 
performance. Our analysis has been sot in the context of a recruit pool 
shifting toward a much higher representation of women and minorities. 

To answer the first question, we examined asvab scores during the 
I980's and found that (1) most gains in recmit quality occurred in the 
first half of the decade, (2) technical abilities of recruits have begun to 
decline, and (3) women and minorities continue to score lower on tech- 
nical measures than white males. These findings suggest that an 
increased burden will be placed on the sen dees' training establishments 
to assure the technical competence of their future graduates. The ser- 
vices' response may also need to include more jemographically sensitive 
training and/or additional compensatory training to raise basic skill 
levels. 

Our response to the second question involved an analysis of classroom 
grades from thirteen technical courses. Our findings indicated that (1) 
some deficiencies exist in the Army's compucerized grading system; (2) 
during training women and minorities overcome their initially lower 
technical scores in the Navy and Air Force, but not in the Army; (3) 
classroom success appears more related to a general auility level as mea- 
sured by ASVAB than to the Electronics Composite score currently in use, 
partijularly for women; and (4) asvab's ability to predict classroom suc- 
cess for minorities is weak. 

The last three findings are interrelated. Unlike the Army, in the Navy 
and Air Force, women enterea training with significantly higher afqt 
scores than men. In addition, the gap in afqt scores Utween whites and 
nonwhites was twice as large for Army trainees as for their Navy and 

55 

Q Page 53 GA0/PEMD-.914 Military Technical-Training Effectiveness Is Unknown 



ERIC 



Chapter 5 

Summary, Recommendations, and Agency 
Conmients and Our Response 



Air Force counterpartb. P on these findings, we concluded that the 
services should consider dc . eloping a more general asvab derivative, 
such as our factor score, to assign women and minorities to technical 
training. 

We found that there was insufficient evidence to attribute the weak 
relationship between asvab and course grades for women and minorities 
either to problems with asvab or to factors in the training environment 
Yet, whatever its source, the relative inconsistency of the two measures 
exists and should be addressed by both the recruiting and training 
communities. 

In response to the third question, we examined post-classroom measures 
of training effectiveness. We concluded that (1) only the Army routinely 
collects data on individual field performance useful for training evalua- 
tion purposes; (2) on the basis of these Army data, asvab scores are even 
weaker predictors of field performance for women and minorities than 
of classroom success; and (3) the Navy's training evaluation component 
is in need of more intense review and reform than it is currently 
receiving. 

In summary, we found serious weaknesses or gaps at each of the data 
points required by the evaluation model posited in chapter L Of these, 
the most serious deficiency is the inability of the Air Force and Navy to 
base their evaluation of their selection procedures and classroom 
training in systematically collected, objective field performance data. 
Without the ability to test the "fit" of these data points with one 
another, the services are not able to maximize their training effective- 
ness, or even to estimate realistically how successful their training 
investment is in producing skilled operators and maintainers of 
today's — and tomorrow's — sophisticated weaponry. 



ReCOnirQendationS belleve that evaluating the effectiveness of th« training provided by 

the services is crucial if they are to meet the future challenges of 
changing recruit demographics and increasingly sophisticated weap- 
onry. Therefore, we make the following recommendations for action at 
each of the three information collection points that we consider essential 
to adequate training evaluation: (1) that the Office of Force Manage- 
ment and Personnel direct the personnel research it coordinates among 
the individual services to identify more sensitive predictors of classroom 
performance for women and minority students from the asvab data it 
already possesses; (2) that the Secretary of the Army t ect the Training 

5b 



ERJC 



Page 54 GA0/PEMI>91-4 Militarv Technical-Training Effectiveness Is Unknown 



Chapter 5 

Summary, Recommendations, and Agency 
Comments and Our Response 



and Doctrine Command to review the classroom grading procedures 
identified within the report as deficient, for their accuracy, appropriate- 
ness, and reliability; (3) that the Secretary of the Navy establish a firm 
deadline for developing a training evaluation program and tha*; he direct 
that the adequacy of current resources allocated to this effort be reex- 
amined. Finally, we recommend that the Assistant Secretary of Defense 
for Force Management and Personnel review alternative measures of 
field performance already developed by the sei*vices under the Job Per- 
formance Measurement project for their potential applicability to 
training and on-the-job performance evaluation. 

Our purpose in this study has been to review the ability of the services 
to monitor, evaluate, and (where necessary) -adjust training to changes 
in the demographics and technical ability of the recruit pool and to the 
technical sophistication of weapons systems. Whatever changes in our 
military posture are occasioned by shifts in the nature of threats to our 
national security, we believe that accurate information • relating to the 
recruit pool, to the effectiveness of military training, and to on-the-joo 
performance will continue to be essential to the mission of our arn>3d 
forces. 



In its written response to a draft of this report, dod concurred with all of 
Its rcLommendations and identified specific actions to be taken toward 
implementing them, dod :i1so concurred or partially concurred with what 
it identified as the main findings contained in the report, dod also raised 
some technical methodological questions and offered some thoughtful 
inteipretations of our findings. (See appendix V.) We have reviewed 
these comments and, where appropriate, have made changes to the text. 

dod generally agreed with our description of changes in recruits' A^vAn 
scores during the past decade. It commented, however, that »t wouul b^* 
inappropriate to define a recruit's technological sophistication merely as 
his or her Electionics Composite score. We agree that this would q a 
very limited definition, and for this reason our report encouraged the 
development of better predictors of success in more technologically 
demanding occupational specialties, dod's speculation that the decline in 
Electronics Information scores is attributable to a decline in technical 
vocational education in high schools is persuasive. It could as well have 
speculated that the lower Electronics Composite scores of women 
recruits are attributable to their traditionally lower enrollment in such 
courses. 

57 

O Page 55 GAO/PEMD-914 Military Technlcai«Tralnlng Effectiveness Is Unknown 

ERJC 



Agency Comments and 
Our Response 



Chapter 5 

Summary, Recommendations, and Agency 
Comments and Our Response 



DOD r3norally concurred with our analysis of classroom grades and their 
rela ionship to asvab predictors. However, it questioned the appropriate- 
ness. >)f some of our procedures, dod summarized its methodological con- 
cerns as (1) inappropriate pooling of grades from courses with different 
metrics, (2) implausibly high factor scores after correction for restric- 
tion in ran^o, (3) lack of detailed regression analyses for differences 
between subgroups, and (4) small sample sizes for subgroups. 

DOD incorrectly assumes that we simply pooled raw course grades from 
different course Before performing correlation analyses, we standard- 
ized course grades to a common metric to ac^just for any differences 
between courses in grading procedures. We have also added to the draft 
we provided dod parallel tables of results on the individual-course level. 
rSee appendixes II and III.) 

We share dod's concern about the apparently inflated values of the 
ac^usted validity coefficients for factor scores, but we disagree with 
their speculation that inappropriate statistical procedures are the source 
of this inflation. We applied the same conventional ac^ustment proce- 
dures to all three scores — afqt. Electronics Composite, and factor 
scores — and, as dod comments, for the first two scores our results **are 
consistent with other analyses.** As we stated in the draft report, the 
factor scores were based on the asvab norm group correlation matrix 
provided us by dod. Having performed a principal-components analysis 
of these data, we applied the resultant scoring coefficients to our sample 
to obtain factor scores. This procedure ideally offers two advantages. 
First, it bases the correlation analysis on a norm group presumably 
closer to the universe of applicants to military service than our sample 
of relatively high-scoring recruits. Second, it permits ac^ustment for 
restriction of range. 

After thorough reexamination of our procedures and the data to which 
they were applied, we concluded that the results of factor analysis of 
the dod correlation matrix should not be applied to our sample because 
of differences between the two samples in the mrgnitude of subtest 
intercorrelations. dod reported substantially higher intercorrelations 
than were present in our sample. As a result, the variance of our 
sample's factor scores, when based on the dod correlations, was inappro- 
priately restricted, and the ac^ustment for range restriction was overes- 
timated. (All other things being equal, the smaller the sample variance, 
the greater the adjustment for restriction in range.) 



5" 



^ Pa^e 56 GAO/PEMD-914 NiUilary Technical-Training Effectiveness Is Unknown 



Chapter 5 

Suiiunary, Recommendations, and Agency 
Comments and Our Respcstse 



We therefore have recalculated our factor scores, deriving them from a 
principal-component analysis of our sample's asvab scores rather than 
from an analysis of the norm-group correlation matrix provided by dod. 
Consequently, no adtjustment for restriction of range would be appro- 
priate for these scores. While the correlations of these factor scores v^ith 
our criterion measures vary somewhat from those originally reported 
(be: -^g in some cases higher and in others lower), the slight differences in 
no way affect the conclusion that we reached in the draft report and 
with which DOD has agreed in both written and oral comments— namely, 
that a broader-based measure than the simple composites currently in 
use would provide a valuable predictor of classroom performance. 

DOD cites the absence of certain regression-related statistics — intercepts, 
regression coefficients, and standard errors of estimates — and the small 
sample size in some subgroups as reasons for not ^^generalizing to other 
samples" or "making policy decisions" on the basis of our report. First, 
for simple bivariate relationships such as we analyzed (asvab versus 
coui } grades or sqt), our detailed reporting of means, N's, correlation 
coefficients, and significance levels serves essentially the same function 
as these equivalent regression statistics. We would, however, gladly pro- 
vide our data base to dod for alternative analysis. Second, we repeatedly 
draw the reader's attention to the problem of small sample size in some 
subgroups. Most importantly, we strongly agree that, unless they are 
replicated on larger samples, our analyses should not be the basis for 
significant policy shifts in selection and classification of recruits. 
Rather, we recommended (and dod concurred) that the services attempt 
to develop more sensitive predictors of training success for minorities 
and women. (Indeed, one of the main strengths of our work here is that 
it determined the insensitivity to these populations of current 
predictors.) Should the results of these efforts prove successful, policy 
changes would then be appropriate. 

The Army found "neither surprising nor particularly disturbing" the 
fact that we were not able to use many of the test scores they provided 
for some courses because they do not discriminate among soldiers' per- 
formances. We would point out ohat (1) the same software and report 
formats are used to assign scores to trainees in these courses as in other 
similar courses where we found usable scores; (2) we were able for some 
of these cases to reanalyze the individual measures and derive mean- 
ingful scores; and (3) the Army assigns and maintains rank-in-class sta- 
tistics for each graduate of these courses on the basis of this software, 
thus itself implicitly measuring and recording the relative performance 
of individuals. While our ability to perform correlational analyses may 

59 

O Page 67 GAO/PEMD-91-4 Military Technlcal'TraiiJng Effectiveness Is Unknown 

ERIC 



Chapters 

Summary, Recommendations, and Agency 
Comments and Our Respoa^e 



not be a critical need, in our opinion the Army's ability to perform objec- 
tive evaluations of the effectiveness of its courses is. We therefore wel- 
come the concurrence of the Army in our recommendation to review its 
testing procedures for the courses we identified, 

DOD commented on our review of field measures of training effectiveness 
for each of the services, asserting tliat our negative view of asvab scores 
as a predictor of performance for female and minority soldiers was con- 
trary to research on predicting training success. Not only does dod pro- 
vide no specifics on this research but also, and more importantly, it is 
not clear how predicting training outcomes is directly relevant to the 
issue of field performance. Of more interest are the preliminary results 
reported from ongoing research by the Army Research Institute. These 
results suggest a fairly strong relationship for women and a somewhat 
weaker, but still significant, relationship for blacks between asvab and 
SQT in larger occupational specialties. The Army appears to concede that 
these results may not be true for smalle'-, more technical specialties, 
such ?s the ones we examined. What is most noteworthy about the 
Army s response, however, is its capability to perform these analyses of 
field performance routinely, a capability that the Navy and Air Force do 
not share. 

The Navy supplied some information on recent steps being taken to 
enhance training evaluation methods in addition to the ones we identi- 
fied in the report. The Air Force commented that they do not have SQT's 
and do not plan to introduce them in the near future. It noted that 
"testing, recoding, and documenting individual performance for statis- 
tics is very time-consuming, requires additional manpower, and is cost- 
prohibitive." It would be difficult to agree with the Air Force that deter- 
mining the effectiveness of individual performance is merely a statis- 
tical er ^eavor, or even that it is an optional one. Rather, if '^es at the 
core of our ability to know how well we are prepared for meeting critical 
defense challenges. Indeed, given the cost and complexity of today's mil- 
itary equipment, it is imperative that all the services possess adequate 
evaluative data to monitor how well personnel are being trained use 
and maintain these weapons. Our report does not propose the ini^oduc- 
tion of SQT's into other services, nor does it aitempt to determine the 
cost-effectiveness of SQT's. It does, however^ assert the need for objec- 
tive, systematically collected information . dividual field perfoiTO- 
ance in all services. 



ERIC 



60 

Page 58 GAO/PEMD.91-4 Military Technical-Training Effectiveness Is Unknown 



Chapter 5 

Sunmiary, Reconunendations, and Agency 
Conunente and Our Response 



Finally, dod noted that it had directly addressed the applicability of les- 
sons learned from the Joint-Service Job Performance Measurement Pro- 
gram in 1985, but had deferred implementing any training-related 
application of these measures at that time, dod states that it will explore 
the feasibility of such an application once again. 



61 



Page 59 



GAO/PEMD-91-4 Military Technlcal'TK.inii'g Effectiveness Is Unknown 



Appendix I 

AFQT Mean Score and Electionics Composite 
Summary Statistics: 1981-89 



Table 1.1; AFQT Mean Scores, by 
Gender" 





Male 




Female 




Year 


Number 


Mean 


Number 


Mean 


1981 


163.571 


20395 


22.886 


202.95 


1982 


222,726 


206 26 


30.311 


209.10 


1983 


227,161 


209.51 


32.546 


211.57 


1984 


226.975 


210.36 


32.026 


211.15 


1985 


222,772 


211,55 


35.368 


211.43 


1986 


254,030 


211.94 


37.175 


21273 


1987 


239,122 


212.17 


35.335 


212.42 


1988 


213.493 


212.64 


32.682 


212,04 


1989 


217.783 


211,83 


35.984 


211 78 



^Sum of subtest standard scores 



Table L2: AFQT Mean Scores, by Service' 

Army Navy Air Force Marine Corps 



Year 


Number 


Mean 


Number 


Mean 


Number 


Mean 


Number 


Mean 


1981 


76.284 


195.52 


47715 


208.61 


37.389 


213.12 


25.069 


206.16 


1982 


108.063 


20173 


55.182 


210.06 


57.442 


212.86 


32.350 


205.84 


1983 


121.112 


206.C7 


55.256 


212.52 


51.771 


216 72 


31.568 


20778 


1984 


118.287 


207.07 


57,214 


2.1,85 


50.235 


218.45 


33.265 


207.67 


1985 


111.625 


209.30 


59.604 


211.92 


57.617 


217 08 


29.294 


208.34 


1986 


125.918 


210.33 


68.891 


210.30 


62.372 


217.08 


34,024 


211.44 


1987 


120.538 


210.7>3 


66.078 


210.75 


54.371 


218.10 


33.520 


210.90 


1988 


102.709 


210.88 


69.080 


211.58 


40,087 


219.94 


34.299 


210.93 


1989 


106.126 


209.42 


73,272 


210.40 


42.247 


220.59 


32.122 


211,45 



"Stim of subtest standard scores 



ERIC 



Page 60 



82 



GAO/PEMD-914 5tilitary Technical-Traiiiing Effectiveness Is Unkncwn 



Appendix I 

AFQT Mean Score and Eiectronics Composite 
Suntmary Statistics: 1981-89 



Table 13: AFQT Mean Scores, by Race/Ethnicity' 



White Black Hispanic Other 



Year 


Number 


Mean 


Number 


Mean 


Number 


Mean 


Number 


Number 


1981 


138.431 


209.27 


35.666 


186.56 


6.904 


191.00 


5.456 


194.95 


1982 


189.134 


211.48 


48.377 


190.86 


8.569 


193.97 


6.957 


198.91 


1983 


196.585 


214.19 


47.540 


194.54 


8.616 


198.71 


6.966 


202.54 


1984 


193.193 


215.07 


48.500 


194.99 


9.439 


199.46 


7.869 


204.15 


1985 


190.243 


215.79 


49.663 


197.97 


9.504 


202.32 


8.730 


205.88 


1986 


212.661 


215.94 


56.150 


199.20 


12.059 


: )4.26 


10.335 


206.74 


1987 


198.130 


216.62 


54.166 


198.67 


13.708 


205.00 


8.503 


207.42 


1988 


174.501 


217.16 


50.370 


199.14 


13,567 


205.92 


7.737 


207.84 


1989 


177.111 


216.40 


53.409 


199.07 


15.499 


205.92 


7.748 


206.97 



^Sum o( subtest standard scores 



Table 1.4: AFQT Mean Score Overall 
Totals* 



^Sum ol subtest standard scores 
*^Standard deviation = 20 66 



Year 






Overall total 

Number 


Mean^ 


1981 






186.457 


203.83 


1982 






253,037 


206.60 


1983 






259,707 


209.77 


1984 






259.001 


210.41 


1985 






258,140 


211.53 


1986 






291.205 


211.90 


1987 






274.507 


212.21 


1988 






246.175 


212.56 


1989 






253.767 


211.82 



ERIC 



Page 61 



GAO/PElVID-9i-4 Military Technical-Training Effectiveness Is Unknown 



Appendix I 

AF^JT Mean Score and Eiectronlcs Composite 
Simunary Statistics: 1981-89 



Table 1.5: Electronics Composite Mean 
Scores, by Gender* 






Male 




Female 








Year 




Number 




Mean 


Number 


Mean 






1981 




163,571 




207.89 


22,886 


194.41 






1982 




222,726 




210.00 


30.311 


199.18 






1983 




227,161 




212.91 


32,546 


20" S?. 






1984 




226,975 




213.46 


32,025 


201.40 






1985 




222,772 




212.70 


35.363 


199.57 






1985 




254.030 




211.76 


37,17^^ 


200.57 






1987 




239,122 




212.17 


35.38f. 


200.57 






1988 




213,493 




212.73 


32,681 


199.43 






1989 




217,783 




211.50 


35,984 


199.97 






^Sum of subtest standard scores 












Table 1.6: Electronics Composite Mean Scores, by Service* 




Army 




Navy 




Air Force 


Marine Corps 


Year 


Number 


Mean 


Number 


Mean 


Number 


Mean 


Number 


Mean 


1981 


76.284 


198.22 


47.715 


209.76 


37.389 


215.75 


25,069 


208.27 


1982 


108.063 


204.03 


55.182 


210.33 


57.442 


215.24 


32,350 


207.90 


1983 


121.112 


207.92 


55.256 


212.16 


51.771 


218.34 


31.568 


210.00 


1984 


118.287 


208.56 


57.214 


211.69 


50,235 


219.87 


33.265 


209.70 


1985 


111.625 


208.66 


59.604 


209.66 


57.617 


216.77 


29.294 


208.17 


1986 


125.918 


208.73 


68,891 


207.32 


62.372 


215.48 


34,024 


209.30 


1987 


120.538 


208.79 


66.078 


208.55 


54,371 


217.21 


33,520 


209.36 


1988 


102.709 


209.11 


69.080 


20871 


40.087 


219.01 


34,299 


209.53 


1989 


106.126 


207.19 


73.272 


207.29 


42.247 


218.69 


32,122 


209.65 



^Sum ol subtest standard scores 



B4 

<5 Page 62 GA0/Pi:MD-914 Military Technical-Training EffecUveness Is Unknown 



Appendix I 

AF<2T Mean Score and Electronics Composite 
Summary Statistics: 1981-89 



Table L7: Electronics Composite Mean Scores, by Race/Ethnicity* 



White Black Hispanic Other 



Year 


Number 


Mear. 


Number 


Mean 


Number 


Mean 


Number 


Mean 


1981 


138.431 


212.47 


35.666 


186.45 


6.904 


193.40 


5.456 


197.91 


1982 


189.134 


214.51 


48.377 


190.01 


8.569 


196.37 


6.957 


201.33 


1983 


196.585 


216.81 


47.540 


193.24 


8.616 


200.93 


6.966 


204.31 


1984 ' 


193.193 


217.53 


48.500 


193.49 


9.439 


201.35 


7.869 


206.24 


1985 


190.243 


216.28 


49.663 


193.94 


9.504 


202.50 


8.730 


205.87 


1986 


212.661 


215.50 


56.150 


194.11 


12.059 


203.07 


10.335 


205.78 


l'987'"' 


198.130 


216.19 


54.166 


193.50 


13 708 


203.76 


8.503 


207.23 


1988 


174.501 


216.86 


50.370 


194.08 


13.567 


204.54 


7.737 


207.08 


1989 


177.111 


215.64 


53.409 


193.46 


15.499 


203.66 


7.748 


206 57 



^Sum ol subtest standard scores 



Table 1.8: Electronics Composite Mean 



Score OveraH Totals* Overall total 



Year Nu mber Mean'' 

1981' 186.457 ~" 206.04 

1982 253.037 ^ "208 44 



1983 259.707 ^ 21 1 115 

1984 259.001 H iL^^i 

1985 258.140 _ ^.^^i?? 

1986 291.205 1 2091^7 

1987 274.507 210.47 

1988 246.175 "21067 

1989 253.767 20 945 

^Sum of subtest standard scores 
*^Standard deviation « 22 19 



n5 

PI Page 63 GA0/PEMD.914 Military Technlcal-Training Effectiveness Is Unkno\*Ti 



Appendix II 

Predictor and Criterion Variable Mean Scores 



Table 11.1: Army Mean Scores 



Electronics 

AFQT* Composite* Course grade SQT^ 



Category 


Mean 


Number 


Mean 


Number 


Mean 


Number 


Mean 


Number 


24J 


227.87 


65 


234.75 


65 


86.75 


76 


82.58 


53 


27N 


225.73 


100 


232.85 


100 


88.78 


138 


83.95 


110 


29V 


238.22 


136 


242.92 


136 


93.55 


41 


76.98 


65 


Male 


232.14 


280 


238.46 


280 


89.23 


232 


82.12 


209 


Female 


232.87 


23 


230.13 


23 


80.31 


23 


77.52 


21 


White 


234.00 


255 


240.00 


255 


90.19 


160 


81.86 


144 


Nonwhite 


222.67 


48 


226.Z9 


48 


86.86 


95 


81.45 


86 


All Army 


232.20 


303 


237.83 


303 


88.94 


255 


81J0 


230 



'Sum of subtest standard scores 
^Score on Skills Qualification Test 



Table IL2: Navy Mean Scores 



'Sum of subtest standard scores 





AFQT* 


Eiectronixts 
Composite* 


Course qrade 


Category 


Mean 


Number 


Maan 


Number 


Mean 


Number 


AO 


228.10 


783 


233.13 


783 


8972 


833 


AX 


231.64 


392 


236.16 


392 


90.64 


h69 


STG 


228.57 


3.233 


234.43 


3.235 


90.23 


3.418 


STS 


231.87 


1.698 


237.47 


1.698 


86.89 


1.723 


Male 


229.59 


6.080 


235.33 


6,080 


89.11 


5.882 


Female 


235.59 


76 


230.65 


76 


90.70 


71 


While 


230.49 


5.355 


236.25 


5.355 


89.20 


5.179 


Nonwhite 


224.18 


801 


228.74 


801 


89.57 


U59 


All Navy 


229.67 


6.156 


235.27 


6.156 


89.30 


6.443 



<5 Page 64 GAO/PERID-91-4 AmiUry Technical-Training Effectiveness Is Unknown 



Appendix n 

Predictor and Criterion Variable Mean Scores 



Table 11.3: Air Force Mean Scores 





AFQT« 




Electronics 
Composite* 


Course grade 


Category 


Mean Number 


Mean 


Number 


Mean 


Number 


45530A 


235.53 


119 


240.72 


119 


90,17 


119 


455308 


235.9? 


231 


240.55 


231 


90.82 


231 


30332 


238.12 


212 


245.00 


212 


91 77 


227 


30333 


234.15 


360 


239.77 


360 


91.31 


377 


Malj 


235.';5 


824 


241.94 


824 


91.31 


854 


Female 


237 J3 


98 


235.8^^ 


98 


89.91 


100 


White 


236.22 


825 


241.95 


825 


91.21 


855 


Nonwhite 


23M9 


97 


235.73 


97 


90.76 


90 


All Air Force 


235.68 


922 


241.29 


922 


91 16 


954 



*Sum of subtest standard scores 



Page 65 



^'7 

GAO/PEMD-91-4 ^?Ultary Technical-Training Effectiveness Is Unknown 



Appendix III 



Intercoixelation of Studj Variables by 
Occupational Specialtj^ 



Table Intercorrelation of Study 
Variables: Army, 24J» 







Electronics 




C^rada* 


Category 


AFQT^ 


Composite^ 


Factor^ 


Raw 


Adjusled^ 


Total 












AFQT 


1 00 


0 79'' 


0^a3^ 


031* 


049 


Electronics Composite 


65 


1 00 


081*^ 


032- 


0 33 


Factor 


65 


65 


1 00 


040- 




Grade 


59 


59 


59 


100 




Mate ~ 












AFQT 


1 00 


0 82> 


085^ 


029' 


047 


Electronics Composite 


55 


100 


0 79^ 


028' 


030 


Factor 


55 


55 


1 00 


0 38^ 




Graoc* 


50 


50 


50 


100 




Female 












ArOT 


1 00 


0 81^ 


089- 


043 


063 


Electronics ^ .osite 


10 


100 


0 88' 


015 


015 


Factor 


10 


10 


100 


021 




Grade 


9 


9 


9 


100 




White 












AFQT 


100 


0 82^ 


080' 


024 


0 39 


Electronicc Composite 


49 


100 


079^ 


027 


029 


Factor 


49 


49 


100 


042' 




Grade 


44 ' 


4< 


44 


IX 




Nonwhite 












AFQT 


100 


0 6r^ 


0 80> 


013 


023 


Electronics Composite 


16 


100 


08^^ 


0 15 


016 


Factor 


16 


16 


100 


ot; 




arade 


15 


15 


15 


100 




"CorrelatJO"* coefficients are in upper diagonal and number in ktwer diagonal 



^AFOT « sum of subtest standard scores 

^Electronics Composite « sum of subtest standard scores for Electron cs ^ompos t»> 

^Factor e score from first factor fa>m pnnopa! component analyses 

'Grade = final course grade 

^Adjusted ^ correlation a:jjustcd for rostnction of range 

9p< 05 



O , Page 66 GAO/PEMD-Sl-l Mlllu y Techiilcal-Traliilng Effmlvem»hs Is Uiikiiowii 



Appendix III 

Intercorrelation of Study Variables by 
Occupational Specialty 



Table ill.2: Intercorrelation of Study 
Variables: Army, 27N" 







Electronics 




Grade* 


Category 


AFQT*^ 


ComposUe^ Factor^ 


Raw 


Adjusted' 


Total 


AFQT 


1.00 


0.849 


0.859 


0.369 


0.559 


Electronics Composite 


100 


1.00 


0.929 


0.539 


0.579 


Factor 


100 


100 


^00 


0.489 




Grade 


95 


95 


95 


1.00 




Male 


AFQT 


1.00 


0.859 


0.859 


0.399 


0.599 


Electronics Composite 


94 


1.00 


0.939 


0.529 


0.569 


Factor 


94 


94 


1.00 


0.489 




Grade 


89 


89 


89 


1.00 




Female 


AFQT 


1.00 


0.869 


0.829 


0.849 


0.949 


Electronics Composite 


6 


1 00 


0.969 


0.889 


0.939 


Factor 


6 


6 


1.00 


0.909 




Grade 


6 


6 


6 


1.00 




White 


AFQT 


1.00 


0.&29 


0 329 


0.319 


0.499 


Electronics Composite 


85 


1.00 


0.905 


0.499 


0.529 


Factor 


85 


85 


1.00 


0.435 




Grade 


81 


81 


81 


1.00 




Nonwhite 


AFQT 


1.00 


0.809 


0.819 


0.31 


0.49 


Electronics Composite 


15 


\ 00 


939 


0.655 


0.699 


Factor 


15 


15 


1.00 


0.629 




Grade 


14 


14 


14 


100 





^Correlation coefficients are in upper diagonal and number in tower diagonal 
'^AFQT = sum of subiest standard scores 

^'Electronics Composite = sum of subtest standard scores for E'ectronics Composite 

^Factor = score from first factor from principal component analysis 

^Grade = final course grade 

^Adjusted = correlation adjusted for restriction of range 

3p < .05 



fi9 



ERIC 



Page 67 



GAO/PEMI>*91-4 Military Technical-Training Effectiveness Is Unlotown 



AppendLv HI 

Iiitercorrelation of Study Variables by 
Occupatloud Specialty 



Table ilL3: Intercorrelation of Study 
Variables: Army, 2&V* 







Electronics 




Grade* 


Category 


AFQr> 


Composite^ 


Factor** 


Raw 


Adjusted' 


Total 


AFQT 


1.00 


0.749 


0.799 


0.20 


0.33 


Electronics Composite 


136 


1.00 


0.889 


0.509 


0.539 


Factor 


136 


136 


1.00 


0.389 




Grade 


35 


35 


35 


1.00 




Male 


AFQT 


100 


0.759 


0.809 


0.25 


0.41 


Electronics Composite 


129 


1.00 


0.889 


0.479 


0.509 


Factor 


'129 


129 


1.00 


0.369 




Grade 


32 


32 


32 


1,00 




Female 


AFQT 


1.00 


0.839 


0.809 


0.59 


0.78 


Electronics Composite 


7 


100 


0.909 


0J9 


0.84 


Factor 


7 


7 


1.00 


0.57 




Grade 


3 


3 


3 


1.00 




Whjte 


AFQT 


1,00 


0J49 


0J89 


0.20 


0.33 


Electronics Composite 


119 


1.00 


0.879 


0.539 


0.569 


Factor 


119 


119 


1.00 


0.409 




Grade 


29 


29 


29 


1.00 




Nonwhite 


AFQT 


100 


0.769 


0.859 


0.18 


0.31 


Electronics Composite 


17 


1,00 


0.869 


0.34 


0.36 


Factor 


17 


17 


1 00 


0.23 




Grade 


6 


6 


6 


1.00 





^Correlation coefficients are in upper diagonal and number in lower diagonal, 
^AFOT = sum of subtest standard scores 

^^Electronics Composite = sum of subtest standard scores for Electronics Composite 
*^Faclor = score from first factor from pnncipal component analysis 
^Grade = final course grade 

'Adjusted = correlation adjusted for reiitnction of range 
9p< 05 



70 

^ Page 68 GAO/PEiMD-91-1 Military Technical-Training Effectiveness Is Unknown 

ERIC 



Appendix HI 

Int€rcorrelatlon of Study Variables by 
Occupational Specialty 



Table 111.4: Intercorrelation of Study Hi^HBHHHHBIHHIM^SHHi^^H^^^^HBHHHBi 

Variables: Navy, AQ* Electronics Grade^ 



Category AFQT^ Composite*^ Factor^ Raw Adjusted^ 



Total 


AFQT 


1.00 


0.839 


0 859 


0.259 


0.409 


Electronics Composite 


783 


1.00 




0.279 


0.299 


Faclor 


783 


733 


1.00 


0.259 




Grade 


774 


774 


774 


1.00 




Male*^ 


AFQT 


1.00 


0.339 


0.859 


0.259 


0.409 


Electronics Composito 


783 


1.00 


0.859 


0.279 


0.299 


Factor 


783 


783 


1.00 


0.259 




Grade 




774 


774 


1.00 




White 


AFQT 


1.00 


0.839 


0.849 


0.259 


0.419 


Electronics Composite 


665 


1.00 


0.869 


0.289 


0.309 


Factor 


665 


665 


1.00 


0.279 




Grade 


656 


656 


656 


1.00 




Nonwhite 


AFQT 


1.00 


0.829 


0.869 


0.13 


0.22 


Electronics Composite 


118 


1.00 


0.839 


0.16 


0.17 


Factor 


118 


118 


1.00 


0.07 




orade 


118 


118 


118 


1.00 





^Correlation coefficients are in upper diagonal and number in lower diagonal. 
^AFQT = sum of si'* test standard scores 



^Electronics Composite - sum of subtest star^dard scores for Electronics Comp *oite 

'^Factor = score from first factor from principal component analysis 

®Grade = final course grade 

^Adjusted = correlation adjusted for restrictiort of range 

9p < ,05 

'W)men are prohibited from serving in the Navy's AO occupational specially. 



71 

Q P&ge 69 GAO/PEMD-91-4 Military Technical-Training EffectivenesG Is Unlcnown 

ERIC 



Appendix m 

Int«rcorrelation of Study Variables by 
Occupational Specialty 



Table lil.5: intercorrelation of Study 
Variables: Navy, AX" 







Electronics 




Grade* 


Category 


AFOr* 


Composite^ 


Factor** 


Raw 


Adjusted* 


Total 


AFQT 


1.00 


0.819 


0.839 


0.419 


0.619 


Electronics Composite 


392 


1.00 


0.899 


0.409 


0.439 


Factor 


392 


392 


1.00 


0.399 




Grade 


391 


391 


391 


1.00 




Male 


AFQT 


1.00 


0.879 


0.889 


0.429 


0.629 


Electronics Composite 


321 


1.00 


0.909 


0.439 


0.469 


Factor 


321 


321 


1.00 


0.419 




Grade 


320 


320 


320 


1.00 




Female 


AFQT 


1.00 


0.759 


0.809 


0.399 


0.589 


Electronics Composite 


71 


1.00 


0.833 


0.329 


0.349 


Factor 


71 


71 


1.00 


0.399 




Grade 


71 


71 


71 


1.00 




White 


AFQT 


1.00 


0.809 


o.a39 


0.449 


0.659 


Electronic;. Composite 


336 


1.00 


0.899 


0.469 


0.499 


Factor 


336 


336 


1.00 


0.449 




Grade 


335 


335 


335 


1.00 




Nonwhite 


AFQT 


1.00 


0.789 


0.849 


0.18 


0.29 


Electronics Composite 


56 


1.00 


0.879 


0.02 


0.02 


Factor 


56 


56 


1.00 


0.07 




Grade 


56 


56 


56 


1.00 




^Correlation coefficients are in upper diagonal and number in lower diagonal. 



^AFQT = sum of subtest standard scores 

^Electronics Composite sum of subtest standard scores for Electronics Composite 

^Factor = score from first factor from principal component analysis 

^'Grade = final cou/-e grade 

'Adjusted = correlation adjusted for restriction of range 

9p<.05 



Page 70 



GAO/PEMD-91-4 Military Technical-Training Effectiveness Is Unknown 



Appendix HI 

Intercorrelation of Study Variables by 
Occupational Specialty 



Table 111.6: Intercorrelation of Study ■■■I^H^^HBBBHHHH^H^^H^BHHHH^IBHIBH^HH 

Variables: Navy, STG- Electronics Grade^ 

Category AFOT** Composite^ Factor*^ Raw Adjusted^ 



Total 



AFQT 


1.00 


0.789 


0.809 


0.309 


0.48^ 


Electronics Composite 


3233 


100 


0.849 


0.269 


0.289 


Factor 


3233 


3233 


1.00 


0.289 




Grade 


3123 


3123 


3123 


1.00 




Male^ 


AFQT 


1.00 


0.789 


0.809 


0.309 


0.489 


Electronics Composite 


3233 


1.00 


0.849 


0.269 


0.289 


Factor 


3233 


3233 


1.00 


0.289 




Grade 


3123 


3123 


3123 


1.00 




White 












AFQT 


1.00 


0J99 


0.809 


0.319 


0.499 


Electronics Composite 


2791 


1.00 


0.849 


0.289 


0 299 


Factor 


2791 


2791 


1.00 


0.309 




Grade 


2597 


2697 


2697 


1 00 




Nonwhite 


AFQT 


1-00 


oyig 


0.769 


0.229 


0.379 


Electronics Composite 


AA2 


^0 


0.789 


0,169 


0.169 


Factor 


442 


442 


1.00 


0.129 




Grade 


426 


426 


426 


1,00 




^Correlation coefficients are m upper diagonal and number m lower diagonal 






^AFQT = sum of subtest standard scores 











^Electronics Composue = sum of subtest s^^ndard scores for Electronics Composite 

**Factor = score from first factor from principal component analysis 

*^Grade = final course grade 

'Adjusted = correlation adjusted for restriction of range 

9p < .05 

^Women are prohibited from serving in the Navy's STG occupational specially 



Page 71 



"-'3 

GAO/PEMD-91-4 Military Tec!mical«Traiiiing Effectiveness Is Unknown 



Appendix III 

Intercorrelation of Studv Variables b^' 
Occupational Specialty 



Table III.7: Intercorrelation of Study 
Variables: Navy, SIS" 







Electronics 




Grade^ 


Category 


AFQI** 


Composite^ Factor^ 


Raw 


Adjusted' 


Total 


AFCT 


1.00 


0.769 


0.789 


0.289 


0.459 


Electronics Composite 


1696 


1.00 


0.859 


0.269 


0.279 


Factor 


1698 


1698 


1.00 


0.269 




Grad^ 


1651 


1651 


1651 


'.00 




Male^ 


"QT 


1.00 


0.769 


0.789 


0.289 


0.459 


electronics Composite 


1698 


1.00 


0.859 


0.269 


0.279 


Factor 


1698 


1698 


1.00 


0.269 




Grade 


1651 


1651 


1651 


1.00 




White 


AFQT 


1.00 


0.779 


0.799 


0.289 


0.469 


Electronics Composite 


1518 


1.00 


0.859 


0.279 


0.299 


Factor 


1518 


1518 


1.0C 


0.289 




Grade 


1477 


1477 


1477 


1.00 




Nonwhite 


AFQT 


1.00 


0.709 


0.689 


0.279 


0.449 


Electronics Composite 


180 


1.00 


0.829 


0.11 


0.12 


Factor 


180 


180 


1.00 


0,12 




Grade 


174 


174 


174 


1.00 





^Correlation coefficients are in upper diagonal and number in lower diagonal. 

^FQT = sum of subtest standard scores 
^^Electronics Composite = sum of subtest standard scores for Electronics Composite 
"^Factor = score from first factor from principal component analysis 
^Grade = final course grade 

^Adjusted = cc olation adjusted for restriction of range 
9p < .05 

^Women are prohibited from serving m the Navy's STS occupational specialty. 



Page 72 



GAO/PEMD-91-4 Military Technical-Training Effectiveness Is Unknown 



Appendix HI 

Intercorrelatlon of Study Variables by 
Occupational Specialty 



Table MI.8: Intercorrelation of Study 
Variables: Air Force, 45530A" 







Electronics 
Composite*^ 




Grade* 


Category 


AFQT^ 


Factor^ 


Raw 


Adjur»ed' 


Total 


AFQT 


1.00 


0.749 


0.199 


0.229 


0.369 


Electronics Composite 


119 


1,00 


0.87 


0.279 


0.299 


Factor 


119 


119 


1.00 


0.309 




Grade 


119 


119 


119 


1.00 




Male ' 


AFQT 


1.00 


0.779 


0.779 


0.219 


0.359 


Electronics Composite 


99 


1.00 


0.869 


0.269 


0.289 


Factor 


99 


99 


1.00 


0.279 




Grade 


99 


99 


99 


100 




Female 












AFQT 


1.00 


0.695 


0.639 


0.31 


0.49 


Electronics Composite 


20 


1.00 


0.849 


0.15 


0.15 


Frotor 


20 


20 


1 00 


0.25 




Gude 


20 


20 


20 


1.00 




White 


AFQT 


1 00 


0J59 


0.739 


0.249 


0.399 


Electronics Composite 


102 


1.00 


0.879 


0.289 


0.299 


Factor 


102 


102 


1.00 


0.289 




Grade 


102 


102 


2102 


1.00 




Nonwhite 












AFQT 


1 00 


0 589 


0.659 


0.08 


0.13 


Electronics Composite 


17 


1.00 


0.859 


0.22 


0.23 


Factor 


17 


17 


1.00 


0.33 




Grade 


17 


17 




1.00 





^Correlation coeffrcient? are in upper oragonal and number in lower dragonal. 
^AFQT = sum of subtest standard scores 

^^Electronics Composite = sum of subtest standard scores for Etectronics Composite 

^Factor = score from first factor from pnncipal component analysis 

^Grade = final course grade 

^Adjusted = correlation adjusted for restnctron of range 

9p< 05 



75 

^ Page 73 GAO/PEMD-914 Military Technical-Training Effectiveness Is Unknown 

ERIC 



Appendix III 

Intercorrelation of Study Variables by 
Occupational Specialty 



Table III.9: Intercorrelation of Study 
Variables: Air Force, 455306'^ 







Electronics 




Grade* 


Category 


AFQT^ 


Composite^ 


Factor** 


Raw 


Adjusted' 


Total 


AFQT 


1.00 


0.709 


0.729 


0.229 


0.369 


Electronics Compcs»te 


231 


1.00 


0.839 


0.27a 


0.289 


Factor 


231 


231 


1,00 


0.299 




Grade 


231 


231 


231 


1.00 




Male 


AFQT 


1.00 


0.719 


0.729 


0.239 


0.379 


Electronics Composite 


215 


1.00 


0.849 


0.259 


0.279 


Factor 


215 


215 


1.00 


0.299 




Grade 


215 


215 


215 


1.00 




Female 


AFQT 


1,00 


0.819 


0.839 


0.15 


0.26 


Electronics Composite 


16 


1.00 


0.719 


0.25 


0.26 


Factor 


16 


16 


1.00 


0.10 




Grade 


16 


16 


16 


1.00 




White 


AFQT 


1.00 


0.709 


0.729 


0.259 


0.409 


Electronics Composite 


206 


1.00 


0.819 


0.329 


0.349 


Factor 


206 


206 


1.00 


0.359 




Grade 


206 


206 


206 


too 




Nonwhile 


AFQT 


1.00 


0.669 


0.659 


0.11 


0.19 


Electronics Composite 


25 


1.00 


0.909 


0.05 


0.06 


Factor 


25 


25 


1.00 


0.04 




Grade 


26 


25 


25 


1.00 




^Correlation coefficients are in upper diagonal and number in lower diagonal. 



^AFOT = sum of subtest standard scores 

^Electronics Composite = sum of subtest standard scores for Electronics Composite 

*^Factor = score from first factor from pnncipa! componant analysis 

*^Grad3 = final course grade 

'Adjusted = correlation adjusted for restriction of range 

9p < 05 



76 



Page 74 



GAO/PEMD-91-4 Military Technlcal-Training Effectiveness Is Unknown 



Append**; III 

Intercorrelatlon of Study Variables by 
Occupational Spe«:ialty 



Table 111.10: of Study ■^^BHHI^^^HHPHHn^HBMHHBMIH^Bamill^HIH^H 

Vaiiables: Air Force, 30332" Electronics Grade^ 

Category AFQT'^ Composite^ Factor^ Raw Adjusted' 



Total 


AFQT 


1.00 


0,699 


0.759 


0.399 


0.599 


Electronics Composite 


212 


1.00 


0.819 


0.419 


0.439 


Factor 


212 


212 


1.00 


0.439 




Grade 


212 


212 


212 


1.00 




Male 


AFQT 


1,00 


0.749 


0 789 


0.419 


0.619 


Electronics Composite 


186 


1.00 


0.82^ 


0.409 


0.429 


Factor 


186 


186 


1.00 


0,459 




Grade 


186 


186 


186 


1.00 




Female 


AFQT 


1 00 


0.629 


0.719 


0.34 


0.53 


Electronics Composite 


?6 


1.00 


0.799 


0.489 


0.509 


Factor 


26 


26 


1.00 


0.31 




Grade 


26 


26 


26 


1.00 




White 


AFQT 


1.00 


0.709 


C''79 


0.369 


0.559 


Electronics Composite 


190 


1,00 


0.819 


0.419 


0.439 


Factor 


190 


190 


1,00 


0.429 




Grade 


190 


190 


190 


1.00 




Nonwhite 


AFQT 


1,00 


0.569 


0.709 


0.629 


0.819 


Electronics Composite 


22 


1.00 


0.759 


0.439 


0.46'^ 


Factor 


22 


22 


1.00 


0.619 




Grade 


22 


22 


22 


1.00 





^Correlation coeUicients are m upper diagonal and number in lower diagonal 



WOT = sum oi subtest standard scores 

^Electronics Composite = sum oi subtest stan^'^rd scores for Electi jnics Composite 

^Facto: = sco^e from first factor from principal component analysis 

*^Grade = final course grade 

^Adjusted = correlation adjusted for restriction of range 

9p<- 05 



ERIC 



77 

Page 75 GA0/PEMD.9M Military Technical-Training Effectiveness Is Unki'o^vn 



Appendix m 

Intercorrelation of Study Variables by 
Occupational Specialty 



Table 111.11: of Sturdy H^BBiBHHflH^Hnnij^HH^^^^H^^^HHHHHn^^H 

Variables: Air Forco, 30333- Electronics Grade^ 



Category AFQT^ Composite*^ Factor^ Raw Adjusted' 



Total 


AFQT 


1.00 


0.72? 


0.779 


0.329 


0.509 


Electronics CorTiposi(3 


360 


1.00 


0.839 


0.389 


0.409 


Factor 


360 


360 


1.00 


0.409 




Grade 


360 


360 


360 


1,00 




Male 


AFQT 


1.00 


0J59 


0.799 


0.319 


0.499 


Electronics Composite 


324 


1.00 


0.849 


0.399 


0.419 


Factor 


324 


324 


1.00 


0.349 




Grade 


324 


324 


324 


1.00 




Female 


AFQT 


1.00 


0.589 


0.789 


0.509 


0.709 


Electronics Composite 


36 


1.00 


0.749 


0.22 


0.24 


Factor 


36 




1.00 


0.369 




Grade 


36 


36 


36 


1.00 




White 


AFQT 


1.00 


0.719 


0.779 


0.s345 


0.539 


Electronics Composite 


327 


1.00 


0.849 


0.389 


0 409 


Factor 


327 


327 


1.00 


0.359 




Grade 


327 


327 


327 


1.00 




Nonwhite 


AFQT 


1.00 


0.669 


0.689 


0.10 


0«17 


Electronics Composite 


33 


1.00 


0.709 


0.439 


0.469 


Factor 


33 


33 


1.00 


0.439 




Grade 


33 


33 


33 


1.00 





^Correlation coefficients are in upper diagonal and number in lower diagonal. 
**AFOT = sum of subtest standard scores 



^Electronics Composite = sum of subtest standard scores for Electronics Composite 

*^Factor = score from first factor from principal component analysis 

^Grade = final course grade 

^Adjusted = correlate n adjust' . for restriction of range 

9p< 05 



70 



O Page 76 GAO/PEMD-91-4 MiUtary Technical-Training Ei '*H:tiveness Is Unknown 

ERIC 



Appendix IV 

Army SQT Mean Scores, by 
Occupational Specialty 



Specialty 


Year 


Number 


Mean 


24J 


1985 


154 


86.48 




1986 


152 


87.11 




1987 


102 


82.50 




1988 


92 






Total 


500 


85,23 


27N 


1985 


196 


85.53 




/ 1986 


157 


88,36 




/ 1987 


145 


86.66 




/ 1988 


185 


79.56 




Total 


583 


84,81 


26V/29V 


1985 


1.308 


82.28 




1986 


1.261 


79.39 




1987 


944 


80.19 




1988 


831 


78.77 




Total 


4.3^M 


80.40 



79 

O Page 77 GAO/PEMD-91-4 Military Technlcal^Training Effectiveness Is Unknown 

ERIC 



Appendix V 



Comments From the Department of Defense 




ASSISTANT SECRETARY OF DEFENSE 



WASHINGTON. O.C. 20J0I»4000 



force: managcmcnt 
and pcrsonncl 



1 0 AUG 18S0 



Ms. Eleanor Chelimsky 
Assistant Comptroller General 

Program Evaluation and Methodology Division 
U.S. General Accounting Office 
441 G. Street, NW 
Washington, DC 20548 

Dear Ms. Chelimsky: 

This is the Department of Defense <DoD) response to the 
General Accounting Office (GAO) draft report, "MILITARY TRAINING: 
Ef fectivpTess for Technical Specialties Inadequately Measured, " 
dated May 31, 1990 (GAO Code 973276, OSD Case 8371). 

The report provides a series of useful recommendations that 
arc consistent with ongoing DoD initiatives designed to develop 
more sensitive indicators of trainee performance and to develop 
more cost-effective ways of measuring performance both in the 
schoolhouse and on-the-job. Despite general agreement with the 
report's final recommendations, the DoD doe^ not fully concur 
with many of the specific findings. In several cases, the find- 
ings and conclusion-^ appear to be based on incorrect assumptions 
or inappropriate methodology. Specific issues and details are 
provided in the enclosure. 

In addition, ic is important to note that the field of job 
performance measurement is still a developing science and cost- 
effective measures for use in evaluating training effectiveness 
are not yet available. As discussed in the enclosure, the DoD 
has additional measurement programs in place beyond those dis- 
cussed in the report, and continues to support a substantial 
number of research efforts to expand the boundaries of this 
science. The GAO report substantiates the Department's conclu- 
sions about the demands of selecting and training individuals to 
meet the requirements of technical specialties in the coming 
years, and reinforces current DoD efforts in this area. 

The DoD appreciates the opportunity to comment on the draft 
report . 



Sincerely, 




Enclosure: 
As stated 



ERLC 



Pa&d 78 



80 

GAO/PEMI>*91-4 Military Techiucal TraLnlng Effectivene^ Is Unknown 



Appendix V 

Conuncnts From the Department of Dvifense 



GAO DRAFT RKPORT-DATED MAY 31, 1990 
(GAO CODE 973276) OSD CASE B371 



"MILITARY TRAINING: EFFECTIVENESS FOR TECHNICAL 
SPECIALTIES INADEQUATELY MEASURED" 



DEPARTMENT OF DEFENSE COMMENTS 



****** 



FINDINGS 



FINDING A : Baclcaround : Recruit Duality . The GAO rsportf that, 
if the entry level aptitude, knowledge, and skills of new 
recruits should fall snvrt of human requirements needed to oper- 
ate and maintain new technologically sophisticated weapons sys- 
tems, greater demands would be placed on the Armed Sxirvices to 
compensate for the shortfall through training. The G\0 observed 
that the recruit quality had grown in the eighties, as evidenced 
by the following sts^tistics: 



- in 1980, 68 percent of recruits were high school 
graduates, by 1986, 92 percent had high school diplo- 
mas; and 

- in lySO, 65 i-orcent of the recruits were in the top 
three mental categories on the Armed Forces Qualify- 
ing Test, compared with 96 percent in 1986. 



- the number o' young people available for the milivary 
recruit pool will continue to diminish until the 
mid-1990s; 

- by the year 2000, five of every six new labor force 
entrants will be female, minority group members, or 
immigrants ; and 

- the graduates of the American educational system are 
said to be falling behind the youth of competitor 
nations in technological literacy — while, at the same 
time, weapons systems become increasingly sophisti- 
cated. 



The GAO also reported that the Air Force has expressed concern 
about the quality of recruits, the Navy noted an erosion of its 
Delayed Entry Pool, and for the "irst time in 8 years, the Army 
failed to meet its quarterly recruiting quota in the firct quar- 
ter of FY \989. (pp. 1-1 to 1-5/GAO Draft Report) 



The GAO also reported that: 




ERIC 



Page 79 



GAO/PEMD-SM Military Technical-Training Effectiveness Is Unknown 



Appendix V 

Commcnte From the Department of Deren^ 



2 



OoD Response ; C^^sicur. While the statements attributed to tfie 
Services are essentially correct, they do not provide the "big 
picture." Since FY 1984, quality in the Air Force has remained 
stable at 98 to 99 percent high school diploma graduates yn6 
98 to 100 percent individuals who score average or above on the 
enlistment test. Simultaneously, hkr Force recruiting objectives 
have fallen from 60^000 in FY 1984 to 43,000 in FY 1989, making 
it easier to meet its goals with high quality. Although the Havy 
Delayed Entry Program pool eroded in FY 1989, it is back on 
tar' . And while the Army did not achieve it's first quarter 
FY ) recruiting objective (enlisting- all but 475 of the 24,141 
peop it sought), ir finished FY 1989 exceeding the objective. 
In aadition» the in-.p<sct of the mid-1990s dip in the size the 
youth population will be moderated by reductions in accession 
requirements that are likely to be part of the overall down 
sizing of the military during this decade. 

The GAO report also mentions that American youth are falling 
behind youth of competitor nr^tions in "technological literacy." 
While unaware of the existence of interna'cion^J "technological 
1 .taracy" data, it is the DoD objective to snlirt those youth who 
can acquire the skills to field sophisticat^fd weapon systems. To 
that end, the education of the nation's youth is of paramctint 
importance to the DoD. Given students' lacK2 ussier performance on 
both national ar.d international tests over the last decade, the 
DoD has formed a collaborative, working arrangement with the U.S. 
Department of Education, whereby the Department is *,?sisting then 
with development and fielding of nev international literacy 
tests. The DoD is also experimenting with those same tests with 
hopes of improving the Joint-Service enlistment test. The 
Department shares the GAO concern and hopes to have muc* - 
improved, international comparative literacy data over Jtie next 
several year^ . 

FINDING S ; The Quality of Military Recruits — 19t » "1989 Test 
Results . The GAO reported that the Armed Servir<*s vocational 
Aptitude Battery is comprised of ten subtests steardring abilities 
considered important for Military Service. The JAO i.lso reported 
that all the Services use the same component subtests for two 
composite scores; the Electronics composite and uhe Armtd Forces 
Qualification Test, which is the primary mentpA crj^terii. for 
entry into the A.^med Forces. The GAO found the tollc^iCtq regard- 
ing Arned' Forces Qualification Test: 



overall scores improved about 4 oercent between 1981 
and 1989; 



male recruit scores began and ended the decade 
slightly higher tb^.n female scores; 




Page 80 



GAO/PEMD-914 Military Techrical-Tralnlng EffecUveness Is Unlmown 



Appendix V 

Comments From the Department of Defense 



3 



scores differed more substantially across racial 
groupings than between genders; 

white recruits scores began the decade 10 percent 
higher than minority scores and ended 7 percent 
highej. ; 

mean scores for all Services were significantly higher 
in 1989 than 1981; 

Army scores began the decade substantially below those 
of the other Services, but- by 1986/ had reache . the 
same level as Navy and Marine Corps recruits; and 

average Air Force scores have consistently been nigher 
than the other Services and have not displayed their 
tendency to plateau at mid-decade levels. 



The GAO found the following regarding the Electronics Composite: 



mean scores rose 2 percent betwee*. 1981 and 1989; 

scores peaked in 1984 and have shown a gradual decline 
since then; 

female recruics scored approximately 5 percent lower 
than male recruits during the eighties; 

white recruits scored about 11 percent higher than 
minorities in 1981 and 9 percent higher by 1989; 

the narrowing of the gap for minorities/ however, was 
achieved in the first half of the decade — by 1989, 
scores for all racial groups were declining; 

the interservice pattern of scores mirror those of the 
Armed Forces Qualification Test, with the Army making 
up a 10 point difference with the Navy and Marines by 
1986/ and the Air Force on top throughout; and 

mean scores for the three Services changed vary little 
from 1985 to 1988, but Army and Navy scores declined 
significantly in 1989. (pp. 2-1 to 2-7 /GAO Draft 
Report) 



DoD Response; Partially concur. Although the individual c^cu- 
laticns have not been corroborated by the DoD due to time con- 
straints, trends reported in the Armed Forces Qualification Test 
score data presented for comparison of groups (i.e., gender, 
race/ethnicity/ and Service) look reasonable, as do the trends 




Page 81 



«3 

GAO/PEMD-91-4 Military Technical-Training EffecUvcness Is 'Jnknown 



Appendix V 

Comments From the Department of jDefense 



4 



reported regarding the Electronics Composite. Some technical 
questions suggest, however, that clarification may be necessary 
in the GAO narrative. 

For example, the GAO report states that Armed Forces Qualifica- 
tion Test "scores improved about 4 percent between 1981 and 
1989." In othnr statements, various percentage changes are 
mentioned for the Armed Forces Qualification Test and the Elec- 
tronics Composite. Computing percentage gains or changes in 
subtest standard scores is not statistically appropriate. Scores 
on the Armed 'Services Vocational Aptitude Battery, of which the 
Armed Forces Qualification Test and the Composite scores are a 
part, do not have a meaningful zero point and, therefore, per- 
centage changes cannot be interpreted. Computation of percent- 
ages requires a ratio scale, which is more powerful than the 
score scale for all aptitude tests, including the Armed Services 
Vocational Aptitude Ba*-*-ery. The same limitation applies to 
interpreting changes o he Electronics Compo5iite. 

Some factors related to changes in how scores have been computed 
are relevant, particularly since the report examines scores 
across several years. Between 1981 and 1989, there were several 
changes in the Armed Forces Qualification Test (e.g., the sub- 
tests used to compute the Armed Forces Qualification Test score 
were changed and the reference population for norming of the test 
was updated) . It is unclear if the differences in how scores 
were computed over the years were taken tnto account in the 
analyses presented in Appendix 1 and Figures 1, 2, and 3/ clari- 
fication as to these differences appears appropriate, otherwise 
comparisons of means will not be interpretable. The same sort of 
changes occurred over the years in the calculation of the Elec- 
t lies Composite and would affect iuterpretacion of Figures 5, 
S, and 7 . 

Finally, with the large sample sizes achieved in the data analy- 
ses, statistical significance can be observed for differences 
that have relatively little practical significance. For example, 
while the statement that "... Navy scores declined signifi- 
cantly in 1989 (relative to 1988)' is true, the drop was from a 
score of 211.58 in 1988 tc a score of 210.40 in 1989. That small 
a drop from one year to the next would be worth noting, yet not 
".ause for alarm. 

FINDING C : Tho Quality of Military Recruitg — Number of Recruits 
Qualified for Hioh Tcchnolocyy Specialties During the Period 
1981-1989 . The GAO reported that , as another measure of recruit, 
qualification trends, it enumerated the number of recruits whose 
Armed Services Vocational Aptitude Battery scores met minimum 
standards required for entry into two selected high technology 



— fit- 



id 

ERIC 



Page 82 



GAO/PEMD-91-4 Military Technical-TraiiUiig Effectiveness Is Unknown 



Appendix V 

Comments From the Department of Defense 



5 



military specialties: (1) air traffic controllers and (2) sys- 
tems repair technicians, Tht GAO f und the following for the air 
traffic controller specialty: 



- in 1981, approximately 38,000 recruits qualified for 
the specialty and by 1986,. more than 69,000 recruits 
qualified — but, since then, the number qualifying has 
declined to 58,000; 

- "^n 1981, 87 percent of the qualifying recruits were 
white males, while two-t*iirds of all recruits were 
white males; 

- by 1989, 84 percent of the qualifying recruits were 
white males, while only 61 percent of the recruits 
were white males 

- while one third of the white males entering the Ser- 
vice qualified on the basis of their Electronics 
scores, fewer than 15 percent of the white females so 
qualified and fewer than 10 percent of the minority 
males and 3 percent of the minority females qualified 
on the basis of their Electronics scores - 



The GAO found the follo./ing for the Systems Repair Techni.cien: 



in 1981, the number of qualified recruits for the 
System Repair Technician specialty numbered 16,563 
and, by 1983, the number had increased sharply — but 
by 1989, it had fallen back to within 700 of the 1981 
level; and 

- the vast majority of those qualified were white 
males, of whom 11 percent qualified compared with 
less than 2 percent for other demographic groups. 



The GAO concluded that, based on its review, recruit quality 
trends during the eighties are not reassuring. The GAO also 
observed that fewer recruits are qualifying for the more demand- 
ing technical occuoational specialties. The GAO further con- 
cluded that, with \»omen and minorities forming the bulk of the 
new entry labor force by the year 2000, providing well-trained 
personnel for a technologically sophisticated military can be 
expected to become increasingly difficult. The GAO also noted 
that, in turn, the burden cn training will increase, along with 
the need to mc.iitor it3 effectiveness. (pp. 2-7 to 2-11/GAO 
Draft Report) 

DoD Response : Partially concur. Providing well-trained person- 
nel will become increasingly difficult should recruit quality 




ERIC 



Page 83 



GAO/PEMD-91-4 Military Technical-Training Effectiveness Is Unknown 



Appendix V 

Comments From the Department of Defense 



6 



diminish- However, the DoD does not consider that recruit qual- 
ity trends during the eighties, particularly ::he mid-to-late 
1980s, are troublesome. During the last halt of the decade, 
recruit quality has never been botter- Compared to the youth 
population from which the DoD recruits, the quality level has 
consistently been well above avetage. For e: :?ple, i FY 1989, 
92 percent of new recruits had a high school iiploma, in contrast 
to 74 percent in the youth population. Also, in FY 1989, 94 
percent of new recruits scored average or above on the enlistment 
r.esr:, compared to 69 percent in the youth population. 

Although it is reasonable that the GAO would want to assess hew 
the aptitude of recruits for technologically sophisticated spe- 
cialties has changed :.ince 1980, the methodology selected to do 
so is flawed. Equating a decline on the Armed Services Voca- 
tional Aptitude Battery' s electronics composite to a decline in 
recruits' "te':hnological sophistication" is inappropriate. The 
electronics composite is composed of four subtests that measure 
mathematics ability (arithmetic reasoning and mathematics knowl- 
edge), general ccieace, and electronics mforiration . As the 
report Figure 8 indicates, the decline in per^ora^nce on the 
composite is driven primarily by the decline i rformance on 
one subtest — electronics information. 

There is also a flaw in the example used by the GAO beginning on 
page 2-8, wherein the report refers to the Air Traffic Control 
specialty as having a minimum entry standard as of May 1989 of 
230 on the Electronics composite (in standard score form) . Air 
Traffic Control, Air Force Specialty Code 272X0, is selected on 
the General Composite and has never had an Electronic? require- 
ment. That renders report Figure 9 incorrect, if based on the 
composite described in the text. The GAO may have actually 
performed its ai.r^yses on the specialty titled Aircraft Control 
and Warning Radar Specialist, Air Force Specialty Code 303X2; in 
report Table 3.7, that specialty is correctly reflected as having 
an Electronic* Composite qualifying score of 230. 

The other specialty used by the GAO in this finding is Systems 
Repair Technician, an occupation so specialized that it is not 
assigned an Air Force Specialty Code, but ir» identified by a 
Reporting Identifier (99104) . It would be appropriate for the 
report mention that individuals qualifying for this specialty 
are not qualified for a "typical" high-technology jo^), but are at 
the very highest end of the tr zhnical continuum, A footnote 
identifying the specialty and its cutoff score requirement fojld 
be appropriate, similar to the footnote given at th(* bottcni of 
page 2-8 ror the other specialty. 

It is sp' culated that the test score decline on the electronics 
information subtest is attributable to nationwide educational 




Page 84 



GAO/rEMD-91-4 Alllltary Technlcftl-Tralnlng Effectiveness Is Unknown 



Appendix V 

Comments Fro!xi the Department of Defense 



7 



curriculum changes. Over the course of this decade, dramatic 
changes have occurred in public and private elementary and soc- 
c.idary education programs. , Thej"'* reforms have been well publi-* 
cized and documented. As high school graduation standards have 
become more stringent, students have had fewer opportunities to 
take elective coursework. Consequently, enrollment in vocational 
e^iUCo-tion courses, like e±ectronics/electricity, has declined 
dramatically. Throughout the 1980s, recruit quality, as measured 
on the Armed Services Vocational Aptitude Battery's Armed Forces 
Qualification Test composite, has improved. However, as the GAO 
pointed out, performance on the electronics subtest/composite has 
declined. Again, this is 'considered to be an artifact of the 
educational reform movement. Students simply are no longer 
enrolling in the technical and trade vocational classes where 
they can learn basic electi-onics/electricai constructs. 

The electronics composite is a v^lid predictor '^f succe'^s in 
training and on the job for occupational specia^ 3S requiring 
electronics/electrical knowledge. Given that it ±s also known 
that youth are taking fewer formal courses in this area prior to 
entry into the military, the DoD is interested in improving its 
ability to select and classify recruits into electronics-related 
occupations. To that end, there is research in progress to 
improve the content of the current enlistment test. A nurftber of 
large-scale research projects, on both new papei-and-pencil and 
computerized tests, are underway in hopes of find.rng better 
predictors of performance in military training and occupations. 

The Department reiterates, however, that it is inappropriate to 
equate performance on the electronics composite with recruits' 
overcill "technological sophistication" and to conclude that this 
sophistication has declined over the decade of the 1980s. Unfor- 
tunately, there is no way to conduct a historical study on this 
subject. The DoD concurs with GAO researchers that the youth and 
entry-level labor force demographics are changing and that the 
Department needs to study carefully the effects of its enlistment 
test and concomitant composites on the people (e.g., women, 
minorities) that will be recruited in the future. To that end, 
the results from enlistment test research described above are 
expected to be helpful in making future enlistment test deci- 
sion^ . 

FINDING D : Schoolhouse Measures of Training Effectiveness — ^Arrov . 
The GAO reviewed course grades in Army advanced individual tr in- 
ing courses for five occupational specialties to determine the 
extent to which appropriate data were avaiJ able to the Military 
Services for use in judging training effectiveness. The GAO 
found that the course grades for the five specialties were not 
equally reliable indicators of performance during training. The 
GAO noted, for instance, that at Fort Gordon it was unable to 




Page 85 GA0/PEMD-9l»# Military Technlcal-Training EffecUveness Is Unknown 



Appendbc V 

Comments From the Department o^ Defense 



find a consistent relationship between r estone measures and 
final grades, nor was it able to locat- »yone who could suggest 
a re:,ationship. The GkO concluded that the grades recorded for 
two of tne courses (36L and 39B) could not be used to discrimi- 
nate reliably among the performance of individual trainees. The 
GAO found inconsistencies in scoring between different classes 
and even within the same class. The GAO also found that Fort 
Gordon's grades (unlike Redstone's grades) were based partially 
on measures of phyi;ical conditioning that appeared to be unre- 
lated to job perfDniance. The GAO concluded that the psychomet- 
ric differences ^t found at Fort Gordon appeared to be the result 
of a number of tciCuOrG including (1) questionable data entry 
procedures and software and (2) the pass/fail nature of the 
criteria used to evaluate student progress. GAO suggested that 
subject matter experts need to develop more finely tuned, objec- 
tive, and reliable measures of performance than "go/no-go." The 
GAC noted that, because of the problems encountered at Fort 
Gordon, it excluded those courses from its sample of Army train- 
ees, resu''ting in the inclusion of all recruits who completed 24 J 
and 27N training between October 1987 and July 1989, and approxi- 
mately one-third of those who completed 29V training during the 
same period. 

The GAO founa that, on the Armed Forces Qualification Test and 
the Electronic Composite, male trainees scored significantly 
higher than did females and white trainees performed better Lhan 
minority students. The GAO further found that the training 
performance differences correspond with the test score differ- 
ences on both tests for ^he racial groupings. The GAO noted that 
for gender, training performance dift'eiT-rPces between males and 
females were larger than test score differences. The GAO also 
found that the Electronics Composite is a better predictor of 
success than the Armed Fore is Qualification Test. 

The GAO further found thdt, for its entire sample, the score on 
the Electronics Composite explains 18 percent of the variation in 
course grades, more than the Armed c'orces Qualification Test — and 
a GAO-developed "factor score," which is the weighted sum of all 
Armed Services Vocational Aptitude Battery subtests. The "AO 
concluded that, for males, the Electronic Composite score appears 
to be a better predictor of future performance than the Armed 
Forces Qualification Test. The GAO found, however, that for 
fe^"iales, the Armed Services Vocational i^ptitude Battery "factor 
scores" are better predictors of schoolhouse performance than the 
Armed Forces Qualification Test, which is » better predictor than 
the electronics composites. The GAO noted that for minority 
soldiers, the ability to predict training course grades based on 
test scores is the weakest of all groups. The GAO concluded that 
the Armed Forces Qualification Test, or some other general score 
form the Armed Services vocational Aptitude Battery, may provide 



Pa^e 86 



GAO/PEMD 91-4 Military Technical-Training Effectiveness Vj Unknown 



Appendix V 

Comments From the Department of Defense 



9 



a better predictor of success for womer recruits in electronics- 
related training than does the Electronics score. The GAO fur- 
ther concluded that better predictors oi: training performance are 
needed for minority students- (pp. 3-1 to 3-7/GAO Draft Report) 

DoD Response : Partially concur. The Army's testing procedures 
for soldiers undergoing Advanced Individual Training are designed 
to ensure that soldiers achieve specified training objectives. 
To accomplish this, criterion-referenced hands-on performance 
tests are administered and scored on a "go/no-go" basis. Such 
tests are routinely used in the military to evaluate training 
effectiveness because they provide meaningful information to 
course managers on student performance, as well as information on 
the degree to which the course is meeting its stat<2d objectives. 
Given that such tests are not designed to measure the relative 
performance of individuals (i.e., these leasures are not norm- 
rr^erenced), it is neither surprisin9 nor particularly disturbing 
t..dt the GAO found such test results unsuitable for correlational 
analysis. Criterion-referenced measurement, such as the 
••go/no-go" measures used by the Army, are a psychometrically 
sound method when mastery learning is the goal of instruction as 
is the case under discussion. 

As with other findings in the report that describe trends in the 
Armed Forces Qualification Test scores and examine differences 
for groups (e.g., gender and race/ethnicity), the statements 
about training performance differences appear reasonable. How- 
ever, there are problems with some of the specific analyses the 
GAO indicates were performed to reach those conclusions. For 
example, in the Army sample, students from three courses were 
pooled to increase the sample size and the course grades for the 
various specialties were assumed to be on the same score scale, 
or to have the same meaning. In fact, course grades tend to be 
on course-unique metrics and there is no way to e 'aluate whether 
a score of, say, 90 in one course means the same m terms of 
competence as a score of 90 in another course. Thus, the mean 
reported as an average of grades for the three Army courses is 
not meaningful and the relationship to scores from the Armed 
Services Vocational Aptitude 3at*','»ry is tenuous. Note that for 
large samples, such as white 5, the differences in the score 

scales tend to average out ana i,ne correlation coefficients are 
reasonably interpretable . For small samples, however, the dif- 
ferent scales for course grades are likely to distort the corre- 
lation coefficients and means. Since the same analyses of 
cchoolhouse measures of effectiveness were used for each Service 
(Findings D, E, and F) , additional comments applicable to all 
appear in the DoD response to Finding G, the summary finding on 
schoolhouse measures. 




Page 87 



GA0/PEMD-9M Military Technical-Training Effectiveness Is Unknown 



ERIC 



Appendbc V 

Comments From the Department of Defense 



10 



FINDING _F : SchoolhousQ Measures of Tra i nincy Effectiveness — Navy . 
The GAO reported that it examined scores on four training 
courses- (1) Sonar Technician Anti-Sub Warfare Surface, (2) Sonar 
Technician Anti-Sub Warfare Subsurface, (3) Aviation Fire Control 
Technician, and (4) Aviation Anti-Sub Warfare Technician. The 
GAO found the following: 

- male recruits entered training with significantly 
lower Armed Forces Qualificc ions Test scores and 
significantly higher electronics scores than females; 

- final grades for males were slightly, but signifi- 
cantly lower than their female classmates, suggesting 
that a substantial advantage in the Armed Forces Qual- 
ification Test can overcome an advantage in Electron- 
ics; and 

- minority students began training with substantially 
lower scores on both composites but their final grades 
were not significantly different. 

The GAO drew the fcllovring conclusions: 

- that the Armed Forces Qualification Test may be more 
important for training success thai Electronic' s; 

- that for most Navy groupings, the Armed Forces Quali- 
fication Test scores are better predictors of school- 
house performance than Electronic scores; 

- that for females, the Electronics composite is the 
weakest predictor and the "factor score" is the stron-- 
gest; and 

- that the ability of any of the three scores to predict 
training success is weakest for minorities. (pp. 3-7 
to 3-8/GAO Draft Report) 

Pop Responi^f^ : Partially concur. While the GAO concluded that 
the Armed Fences Qualification Test may be more important fox 
predict Lng training success than the Electronics composite and 
that for most Navy groupings, the Armed Forces Qualif icat:ion Test 
scores are better predictors of schoolhouse performance than 
Electronics scores, a recent Navy Personnel Research and Develop- 
ment Center validation report found the opposite result, with an 
average validity coefficient of .59 for predicting "A" school 
success from the Composite vs. an average coefficient of .46 for 
prediction from the Armed Forces Qualification Test 



90 

Page 88 GAO/PEMD-914 Military T<achnical«Trainii» Effectiveness Is UnJaiown 



Aj^pendix V 

Comments From the Department of Defense 



11 



The report also states that the Electronics Composite is the 
weakest predictor and the Factor score is the strongest for 
females. However, statistxs.al results from such a small sample 
(76 females) would not be stable enough to warrant policy 
changes. The results reported by the GAO, in all probability, 
would not be replicated given a lar er sample. Also, the 
adjusted validity coefficients for lange restriction in report 
Table 3,6 show for the Female Factor Score composite an increase 
of ,42, That result is suspect, as normally such adjustments 
rarely provide an increase of more than ,20, 

It should also be noted that only one of the four training 
''ourses represented is even open to v/omen (Aviation Anti- 
Submarine Warfare Technician) , which is not evident without close 
study of report Table 3,6, The data for males in report Table 
3,6 is the result of merging four training courses and produces 
an unorthodox analysis that requires an explanation of grading 
differences which may exist for the different schools. 

As with the previous finding, trends in the Armed Forces Qualifi- 
cation Test scores and the Electronic- Composite in Navy courses, 
including differences for groups (e,g-, gendei: and race/ethnic- 
ity) , appear reasonable with respect to schoclhouse measures of 
training effectiveness. However, the problems with some of the 
specific analyses the GAO indicates were performed to reach those 
conclusions remain a factor. In the Navy sample, students from 
four courses were pooled to increase sample size and the assump- 
tion th;^t course grades for the various courses have the same 
meaning -s tenuous. That limits the confidence in interpretation 
of the relationship to scores from the Arred Services Vocational 
Aptitude Battery, Note that for large sanip)es, such as white 
males, the differences in the score scales tend to average out, 
and the correlation coefficients are reasonably interpretable , 
For small samples, however, the different sc?.es for course 
grades are likely to distort the correlation coefficients and 
means. Additional comments applicable to all appear in the OoD 
response to Finding G, the summary finding on schoolhouse mea- 
sures , 

FINDING F : Schoolhouse Measures of Training Effactiveness — ^Air 
Force , The GAO reported that it examined four Air Force cours- 
es — (1) Aircraft Control and Warning Radar Specialist, (2) Auto- 
matic Tracking Radar Specialist, (3) Photo-Sensors Maintenance 
Specialist, Tactical Reconnaissance Sensors, and (4) Photo-Sen- 
sors Maintenance Specialist, Reconnaissanc^t Electro-Optical 
Sensors, The GAO found that, like the Navy, (1) "factor scores" 
are as good or better predictors th'n composites, (2) for the 
female students, the Armed Forces qualifications Test scores and 
factor scores out predict Electro lie scores, and (^) it is most 
difficult to predict course grades for minority students. 




Page 89 



GAO/rEMD-OM Military Technical'Training Effectiveness Is Unknown 



ERIC 



Appendix V 

Conunents From the Department of Defense 



12 



although factor scores explained 10 percent (46 percent after 
adjustment) . The GAO concluded that because of problems with 
some Army data, and the special preparation of data by the Navy 
and Air Force, it would not be appropriate to make inter-Service 
comparisons or make firm judge- ments about the immediate avail- 
ability of psychometrically suitable rr.aasures from the Navy and 
the Air Force (pp. 3-8 to 3-10/GAO Draft Report). 

POD Response : PartiaJly concur. As with other findings in the 
report, which describe trends in the Armed jforces Qualification 
Test scores end examine differences for groups (i.e., gender and 
race/ethnicity), the statements about traini.»g performance dif- 
ferences appear reasonable. The problems with some of the analy- 
ses the GAO indicates were performed to reach those conclusions 
restrict interpretabilit.y of the findings, as was stated in the 
DoD response to Findings D and 5. Additional v-omments appear in 
the DoD response to Finding G, the summary finding on schoolhouse 
measures. The DoD does concur, however, with the final statement 
in Finding F, which indicates it would not be appropriate to make 
inter-Service comparisons. In addition, research performed by 
the Air Force Human Resources Laboratory confirms many of the GAO 
findings about general ability (such as is measured in the Factor 
Scores the GAO examined) as a valuable predictor of schoolhouse 
performance . 

FINDING G; Schoolhouse Measures of Training Effectiveness — Sum - 
mary. The GAO questioned the differential success in training 
for males and females and for whites and minorities — and about 
the differential predictive validity of the Armed Services Voca- 
tional Aptitude Battery for these groups. The GAO concluded that 
its analysis of gender and race-related differences in mean Armed 
Services Vocational Aptitude Battery scores and course grades in 
the Army suggest ':hat the Electronic compr i*ite is an efficient 
simple predictor of training success. The GAO found, however, 
that in the Navy and Air Force, a more co.nplex relationship 
exists between the Armed Services Vocational Aptitude Battery 
scores and course grades. The GAO noted that gender and race-re- 
lated differences in course grades were quite small compared with 
significant differences in Electronics scores. The GAO concluded 
that an advantage in nore general aptitude, measured by the Armed 
Forces Qualification Tes^*, can compensate for a deficit in Elec- 
tronics — when the deficit is not too great. 

The GAO also noted that, while the Armed Services Vo'^ational 
Aptitude Battery's Electronics composite score demonstrated a 
moderate ability to predict training success for white students 
and males, it was less successful for female or minority stu- 
dents. The GAC concluded the Factor Score that it derived --'as. 




ERIC 



Page 90 



GAO/PEMD-91-4 Military Technlcal-Training EffecUveness Is Unkno 



Appendix V 

Comments From the Department of Defense 



13 



in most cases, the best predictor of training success because it 
utilized information from all ten Armed Services Vocational 
Aptitude Battery subtests. 

The GAO concluded that, based on its work, it was impossible to 
determine whether the Armed Services Vocational Aptitude Battery 
is a weaker measure of ability for some groups — or if some other 
factor in schoolLouse training contributes differentially to the 
success of the different groups. The GAO noted that the relative 
inconsistency between school grades and test scores exists and 
should be addressed by both the recruiting and training communi- 
ties. The GAO further concluded that it will become increasingly 
incumbent on t.he Services (1) to optimize selection criteria for 
advanced indi\idual technical training for women and minority 
groups, (2) to provide compensatory training where needed, and 
(3) to assure that no extraneous factors within the training 
environment interfere with the full achievement potential. (pp. 
3-10 to 3-13/GAO Draft Report) 

Don Respons e: Partially concur. With respect tz GAO findings 
describing trends in tha Armed Forces Qualification Test scores 
and the Electronics Composite and examining differences for 
groups (i.e , gender and race/ethnicity), the statements about 
training performance differences appear reasonable. The analyses 
of the relationships of scores from the Armed Services Vocational 
Aptitude Battery (Armed Forces Qualification Test, Electronics 
Compose ♦•e, and Factor Score) and school grades are flawed and, 
consequently, interpretation of the results of those analyses is 
doubtful. Because the same analytic procedures were used for all 
Services and similar conclusions drawn, the following; comments 
pertain to Findings D, E, F, and G alike. 

Problems with the analyses arise from the following sources: 



pooling students from several courses, when the grades 
for different courses generally are not comparable; 

correction for restriction jf range on the Factor 
Score, which resulted in correlation coefficients that 
are not plausible; 

lack of regression analyses; and 

small sample sizes for females. 



In each Service^ students for several courses were pooled to 
increase sample size and the course grades for the various 
courses within each Service were assumed to be on the same score 
scale, or to have th»"» same meaning. In fact, course grades are 
not normally interpreiiable from course-to-course, because of 



o 

ERIC 



Page 91 




Appendix V 

Comments Pro?'' .ne Department of Defense 



14 



between-course differences in scales and the level of competency 
inferred by a particular score. There is no way to evaluate 
whether a score of, say, 90 in one course means the same as a 
score of 90 in another course. (For the Army, three courses were 
combined, four courses for the Navy, and four for the Air Force.) 
Thus, the mean grades reported for courses in each Service are 
somewhat arbitrary numbers and their relationship to scores from 
the Armed Services Vocational Aptitude Battery is tenuous. Note 
that for large samoles, such as white males, the differences in 
the score scales tt id to average out, and the correlation coeffi- 
cients are reasonably interpretable . For small samples, however, 
the different scales for course grades are likely to distort the 
corr'^lation coefficients and means. 

The correlation coefficients for the Factor Scores are suspi- 
ciously high, especially after correction for restriction of 
range. The Factor Scores are based on the first principa."^ compo- 
nent of the Armed Services Vocational Aptitude Battery ana the 
weights tend to be uniform (from .10 to .14) . The Factor Score 
is the sum of the 10 subtest standard scores and the correlation 
coefficient could be computed using the correlation of sums. An 
important point is that the weights are not regression weights 
computed to maximize the correlation between the aptitude test 
scores and course grades; instead, the correlation coefficient 
for the Factor Score is, in effect, the average for the 10 3ub' 
tests. 

In previous studies, the four subtests in the Electronics Compos- 
ite (Math Knowledge, Arithmetic Reasoning, General Science, and 
Electronics Information) repeatedly tend to have the highest 
correlation with course grades in these )cinds of courses. As a 
rule, therefore, the correlation with course grades should be 
higher for the Electronics Composite than for the Factor Score. 
Deviations from this expectation may be attributed to artifacts, 
such as restriction of range. 

The GAO report recognizes that correlation coe^^f icients in sam- 
ples cannot be compared directly because of range restriction. 
Adj»* tments are made to compensate for differences in restriction 
of range. The adjusted values for the Armed Forces Qualification 
Test and Electronics Composite are plausible in that they are 
consistent with other analyses; the adjusted values for the 
Factor Suore, however, are unduly high and they lack piaosibil- 
ity. The procedure used to correct for restriction of range 
should be based on the multivariate model, which invc ves complex 
formulae and computing routines. The simpler univariate model 
may have been used, which could distort the adjusted values for 
the Factor Score. 



Page 92 



GAO/PE\fD-91-4 AUlltary Techuicfll-Tralnlng Effectiveness Is Unknown 



Appendix V 

Comments From the Deportment of Defense 



15 



Comparisons are made by gender and n inority status based on mean 
scores and correlation coef f icients • Conclusions about the 
appropriateness of the Armed Services Vocational Aptitude Battery 
for females and racial/ethnic minorjities are then based on these 
comparisons. Such comparisons are a good place to start, but 
analyses of gender and race differences should include a compari- 
son of the respective regression lines (slopes and intercepts), 
errors of estimate, and cutoff scores. Analyses of differences 
in mean performance on predictors, final school grades, and 
differences in validity coefficients are not, by themselves, 
sufficient. With the more thorough regressior. analysis, meaning- 
ful conclusions can be raade about the appropriateness of aptitude 
tests for female and racial/ethnic minorities compared to white 



Even if the DoD were to fully concur with the statistical analy- 
ses perforired, interpretation of the results for females would 
remain prc*;lematic because of the small sample sizes. The number 
of females with course grades in the samples are 18 for the At4,.y, 
71 for the Navy, and 98 for the Air Force. Mth such sample 
sizes, differences in scales for course grades may be exacer- 
bated; correction for range restriction could lead to illogJcal 
correlation coefficients; and regression equations with up to 10 
predictor variables would result in unduly high correlation. 
Issues of generalizing to other samples and of making policy 
decisions about selecting females and assigning them to technical 
specialties should always be considered extremely carefully and 
be based cn thorough analysis. Replication of results is the 
sine qua hon of analysis and an adequate sample size is a good 
foundation for replication. The conclusion "thi\t the Services 
should consider developing c more general ASVAB (sic) derivative 
such as our Factor Score to assign wcoien and minorities to tech- 
nical training" (p. 5-2 and 3) is reasonable, and could be pur- 
sued by the military manpower research community. The report 
provides a stimulus to continue effort/5 to 'mprove the effective- 
ness of selecting and classifying recruits, especially for minor- 
ities. 

FINDING H : Field Measures of Training Effectiveness — Armv . The 
GAO reported tnat, although it was aware of numerous post-train- 
ing evaluation activ '.ies performed by the individual services, 
only the Army could , covide individual pei'formance measures. The 
GAO reported that, by Army regulation, a soldier's occupational 
specialty performance is tested within 6 months of completion of 
training and every year, thereafter, under the r^kills Qualifica- 
tion Test program. The GAO found the followii regarding the 
Skills Qualification Test scores: 



males. 



the best predictor of Skill Test scores are final 
schoolhouse grades; 




Page 93 



GAO/PEMD-9I-4 Military Technical-Training Effectiveness Is UnJoiomi 



ERIC 



Appendix V 

Comments From the Dcpmment of Defense 



the Armed Forces Qualification Test and Electronics 
scores were also significantly related to the Skill 
Test scores for whites anc msles, but factor scores 
consistently out predicted the composites; 

for females and non-white soldiers, the Armed services 
Vocational Aptitude Battery scores were not positively 
related to future performance, as measured b\ Skill 
Qualification Test scores; and 

th \ grades scored by females at the schoolhouse were 
inversely correlated with the Skill Qualification Test 
scores . 



The GAO conclud'^d that the traditional Armed Services Vocational 
Aptitude Battery scores may not be the best predictor of perfor- 
mance for the non- rr ^itional soldier — that is, the female or 
minority, soldier, fhe GAO observed that better predictotc of 
success for these groups should be ?ound. (pp. 4-1 to 4-5/GAO 
Draft Report) 

Dor Response : Partially concur. The GAO appears to have incor- 
rectly assumed that Skill Qualification Tests have a common 
metric across different specialties, skill levels, an-S years. 
Due to the re^-uirement to develop new tests each year, individual 
tests are fielded with a minimum of pretest ng. As a result, 
means and standard deviations across a specialty and even across 
years within the same specialty and skill level may vary greatly. 
For example, in the five specialties studied by the GAO, the 
means on the individual skill level 1 test during 1985-198^ 
ranged from 74.5 to 88.4, while standard deviation ranged frow 
3.5 to 14.7. 

During the years 1985-1989, more than 3800 different tests were 
administered in more than 200 specialties annually across skii ^ 
levels 1 to 4. The Army Research Institute is currently anal* 
ing this data (more than 1 million scores) and i.itends to rtport 
Armed Services Vocational Aptitude Battery validities by both 
race and gender as well as for sample size whenever sample size 
is adequate for such analyses. Noting the GAO concern relating 
to low validity for blacks and females in their study, the Army 
has computed validities for these groups for the 198£ jkiZl 
Qualification Tests. For 71 skill level 1 samp^^s ccioprised of 
at least 50 females, the median corrected validity is 58| for 
samples of 50 or more blacks the median validity is .47; Kuq 
median validity for 205 total ^samples is .57 while the Army 
understands the GAO focused only on hia ly tt.chnical spocialtier. 




Page 94 



GAO/PE^fD 91-4 MiUlary Teehnlcid.Trainlng Effectivenesa Is Unknown 



ERIC 



Appendix V 

Comments From the Department of Defense 



17 



total accessions in the five GAO selected specialties numbered 
only 310 compared to more than 120,000 for all specialties during 
1988. 

It is suspected that the finding is affected by tne small samples 
of females and minorities in the GAO analyses. The finding that 
Armed Services Vocational Aptitude Battery scores were not posi- 
tively related to Skill Qualification Test scores for females and 
non-white soldiers is contrary to the body of research evidence 
for predicting training grades m the schoolhouse. The consis- 
tent finding in all Services is that aptitude scores are about 
equally valid for females, racial/ethnic minorities, and white 
males, although there may be some over or underprediction for 
females and minorities. Research results also show that aptitude 
tests predict supervisors' ratings of job performance for blacks 
about as well as for whites. The results presented by the GAO 
should be evaluated in larger samples. 

The same problems oted earlier with analysis of schoolhouse 
training grades af jly to this analysis of Skill Qualification 
Test scores: 



- pooling of specialties — Skill Qualification Tesu 
scores are not on a common metric across specialties, 
and the same numerical value in different tests does 
not, as a rule, mean the same level of competence; 

- the correction for restriction of range on the Factor 
Score leads to aistortion in the results; 

- a regression analysis is appropriate and was not per- 
formed; and 

- the sample size of females (18 or 21) is inadequate to 
draw meaningful conclusions. 



Research in progress pertaining to enlistment test development, 
including computerized tests, will examine implications for 
gender and minority subgroups. 

FINDING I : Field MaasurQs of Training Effectiveness — Navy . The 
GAO reported that it considered two possible sources of field 
information routinely collected by the Navy as measures of the 
effectiveness of the training courses — (1) Level II surveys and 
(2) Advancement in Rating Examinations. The GAO found, however, 
that the Level II surveys have been effectively abandoned by the 
Ncvy, with none having been performed since at least 1986. The 
GAO concurred with the judgement of the test developers and 
administrators that, because the test is not standardized and is 




ERJC 



Pa^e 95 



GAO/PEMD-91-4 Military Technical-Training Effectiveness Is UnknoMm 



Appendix V 

Commente From the Department of Defense 



18 



not administered to a±l graduates, the Advancement in Rating 
Examination is "not a good source of training evaluation feed- 
back." 

The GAO reported that, in 1986, the Chief of Naval Operations 
requested that the Naval Training Systems Center determine the 
current status of Navy training evaluation and provide recommen- 
dations. The GAO further reported that, while numerous non-for- 
mal or non-centralized activities were identified, the Naval 
Training Systems Center found that: 

- the quality of current Navy schoolhouse training 
could not be readil, ascertained for the vast major- 
ity of the courses being offered; 

- there is a lack of technical evaluation/assessment 
skills; and 

- current evaluation activities are fractionated, not 
comprehensive, and operating in an environment of 
obsolete instructions and unclear objectives. 

The GAO reported that the Navy made a number of recommendations 
to upgrade and take a systemati^- approach to traini*.g evaluation. 
According to the GAO, the Navy has assigned a three-person team 
to review the proposals and recommend an integrated training 
appraisal program. The GAO concluded that, while the Navy should 
be commended for its willingness to acknowledge past evaluation 
deficiencies, it seriously questioned whether this response is 
appropriate to the severity and extensiveness of the problems 
that the Naval Training Systems Center has documented, (pp, 4-5 
to 4-8/ GAO Draft Report) 

DoD Response: Partially concur. Level II surveys were discon- 
tinued by the Navy because they were paper-intensive and placed 
an undue burden on the fleet. Moreover, only limited methods of 
evaluating the effectiveness of schoolhouse training were in 
effect at the time the Navy requested the Naval Training Systems 
Center to determine the status of evaluation procedures and make 
appropriate recommendations. Since that tims, however, the Navy 
has successfully employed several means of collecting feedback on 
training effectiveness. In addition to the steps being taken by 
the Navy to enl ance training evaluation methods as reputed by 
the GAO, several other programs are underway. These include the 
(1) Na/y Training Appraisal Program, (2) Navy Traii.xng Require- 
ments Review, (3) Fleet Training Appraisal Program, and (4) 
Maintenance Training Improvement Program, These are discussed in 
more detail in the following paragraphs. 



ERLC 



98 

Page 96 GAO/PEMD-91-4 Military Techifical-Trainlng Effectiveness Is Unknown 



Appendix V 

Comments From the Department of Defense 



19 



A Navy training appraisal program was implemented in March 1989, 
The process provides the Chief of Naval Operations with an 
assessment of the adequacy of Navy training to support warfight- 
ing capabilities in each of the Navy's primary mission areas and 
focuses attention on specific areas where training may be defi- 
cient. The training appraisal program a.iiows scarce training 
assessment resources to be brought to bear upon those training 
programs that fleet feedback reveals are most in need of atten- 
tion. The Navy training appraisal process has th\:s far examined 
acoustic operator, damage control/f iref ighting, electronic war- 
fare operator/maintainer, and "over-the-horizon" targeting sys- 
teips training. 

There is also an ongoing Navy Training Requirements Review, which 
provides direct feedback between warfare sponsors. Systems Com- 
mands, the fleet, and the Naval Education and Training Command on 
a scheduled basis. That program requires fleet experts to talk 
directly to school personnel and provides valuable information on 
training effectiveness. 

Additional training effectiveness feedback ; stems in place 
include the cleet Training Appraisal Program and the Maintenance 
Training Improvement Program which provide fleet performance 
data. The Training Performance Evaluation Board Training Evalua- 
tion and Assessment Division was staffed in February of 1990 and 
has as part of its charter the study of training feedback 
systems , 

FINDING J : Field Measures of Training Effectiveness — ^Air Force , 
The GAO reported that it considered sources of individual level 
data for field performance of Air Force personnel equivalent to 
thos3 it used for the Navy, but concluded that neither the promo- 
tion examinations nor the supervisory surveys were appropriate. 
The GAO further concluded no individual data exist that would 
allow an analysis equivalent to those performec* by the Army with 
the Skill Qualification Test data. 

The GAO reported that other. Air Force training assessment proce- 
dures exist, including Training Quality Reports, Utilization and 
Training Workshops, and Occupational Survey Reports, According 
to the GAO, the Training Quality Reports are part of a reactive 
evaluation process, while the other activities are more concerned 
with front-end analysis, (pp, 4-8 to 4-10/GAO Draft Report) 

DoD Response: Partially concur. The Air Force is aware of the 
potential shortcomings of promotion examinations and supervisory 
surveys for evaluating training <?5fectiveness, and is currently 
developing career field training management guidelines to track 
and enhance the training from enlistment throughout an individ- 
ual's career. Emphasis will be placed on criterion-referenced 



ERIC 




Page 97 



GAO/PERlD-91-4 Military Technical-Training Effectiveness Is Unknown 



Appendix V 

Conunents From the Department of Defense 



20 



objectives rather than the present code levels for performance 
standards - These changes will have a major impact on the present 
promotion system. To expedite feedback from supervisors concern- 
ing any problems with recent graduates, l new policy was recently 
established by the Air Training Command to provide telephonic 
communication on a 24-hour basis between the training center 
providing the training and the supervisor of the graduate. The 
systefit allows more effective and timely communication between the 
supervisor and the training provider. 

The Air Force does not have Skill Qualification Tests for perfor- 
mance ^nd does not plan to have them :^n the near future. Many of 
the tai^ks performed in the field are very complex. Testing, 
recording, and documenting individual performance for statistics 
is very time consuming, requires additional manpower, and is 
cost-prohibitive. Further, many of the new Air Force systems are 
single channel systems, which cannot be used for extensive train- 
ing or evaluating trainees. All these factors combine to make 
the use of hands-on Skill Qualification Tests an inappropriate 
solution to the problem of training effectiveness evaluations. 
The GAO finding that Occupational Survey Reports are concerned 
with front-end analysis is true, but information about what 
first-termers are doing on-the-job provides a good basis for what 
should be trained and what is expected in the initial skills 
courses. As written in the report, the paragraph gives a very 
limited view of what Occupational Survey Reports provide the 
training community and their potential for training assessment. 

FINDING K: Alternative Pat? Sources: The Job Performanco Mca- 
3urer>ent Project . The GAO reported a key impediment to estab- 
lishir.g a field evaluation component of training assessment is 
the expense of developing, testing, and administering measure^^ 
that validly and reliability measure actual performance. 'Vt ^ GAO 
noted that, beginning in the early eighties, a major effort, 
entitled — "The Joint-Service Job Performance Measurement 
Project," designed to address the measurement issues, has been 
underway under the direction of the Office of Accession Policy 
located in the Office of the Assistant Secreta***' of "afense 
(Force Management and Personnel) . The GAO reported that this 
project was initiated after the Armed Services Vocational Apti- 
tude Battery unintentionally allowed some 300,000 less qualified 
recruits into the Military Services and resulted in field com- 
manders' complaints of quality degradation among their personnel. 

The GAO found that the Joint Performance Measurement project: 

- did not set out to establish a link between school- 
house performance and field performance; 



ERLC 



Page 9S 



100 



GA J/PEMD-91-4 MUltary Technical-Training Effectiveness Is Unknown 



Appendix ' ' 

Conunents From the Department of Defense 



21 



- concluded suitable measures of field performance did 
not exist and undertook to develop them; 

- has not reported any analyses of sex- and race-re- 
lated differences, and has not addressed the school- 
house/field connection; and 

- concluded performance measures were expensive to 
develop and frequently costly to administer and, 
therefore, may not be suited to more routine use as 
measures of training effectiveness. 

The GAO concluded that tne investment made to develop the perfor- 
mance measures and their surrogates could prove to be more prof- 
itable if some of the measures developed and the lessons learned 
were more widely applied to the development of realistic assess- 
ment procedures for training. The GAO further concluded that the 
lack of other objective, systematically collected field evalua- 
tion data renders meaningful evaluation of training effectiveness 
impossible. The GAO observed that decision makers in the Con- 
gress, the DoD, or the Services can only react to problems in the 
field after thev have become apparent and have been identified as 
training-relatea. The GAO concluded that, given the cost and 
complexity of today's military equipment, it is difficult to 
understand the lack of evaluative data to monitor how well Ser- 
vice personnel are being prepared to use and maintain those 
weapons. Overall, the GAO concluded that, among the most serious 
deficiencies it identified, was the inability of the Air Force 
and the Navy to found their evaluation of their selection proce- 
dures and schoolhouse training in systematically collected, 
objective field performance data. The GAO further concluded 
that, v/ithout good performance measurement data, the Services are 
not fble to maximize training effectiveness, or even estimate 
realii^tically the success of their training investment in produc- 
ing skilled operators and maintainers of today's and tomorrow's 
sophisticated weaponry, (pp, 4-IO to 5-4/GAO Draft Report) 

DoD Response; Partially concur. The GAO analysis of the back- 
ground, purposes, and findings thus far from the Joint-Service 
Job Performance Measurement Program are generally accurate. The 
GAO has also correctly identified that hands-on performance 
measures are resource-intensive in terms of labor, cost, time, 
and equipment, which limits their value for routine use as field 
measures of training effectiveness. The issue of applying job 
performance measurement technology to traini;ig was investigated 
in May 1985, when the Assistant Secretary of Defense (Manpower, 
Installations, & Logistics) solicited Service responses to an 
inquiry from Congressman Les Aspin, Chairman of the House Commit- 
tee on Armed Services, One of the Chairman's questions specifi- 
cally asked about Service plans for applying job performance data 
to training course design and evaluation. The Service resoonses 



ERLC 



Page 99 



101 

GAO/PEMD.91-4 MiUtary Technical-Training Effectiveness Is Unknown 



Api>endix V 

Comments From the Department of Etefense 



22 



suggested how they anticipated potential applications of job 
performance measurement data. Each of the Services offered a 
plan for institutionalization of job performance measures and 
they identified training evaluation as a likely additional appli- 
cation of Job Performance Measurement technology, to include 
introducing performance measurement into the training feedback 
system. The resource factors identified by the GAO, coupled with 
the need to wait until completion of the enlistment standards 
setting portion of the Job Performance Measurement 
research, resulted in the decision to defer ^full-scale implemen- 
tation of routine job performance data collection for all occupa- 
tions . 

It should be noted there is Service work ongoing that examines 
the link between schoolhouse performance and field performance. 
For example, the Army's Selection and Classification research 
program (which incorporates the Army's contribution to the Joint- 
Service Job Performance Measurement Project) is exairining the 
link between schoolhouse performance and job performance. 
Schoolhouse (end-of-training) and job performance measures have 
been developed and administered to a longitudinal sample in 
several military occupational specialties. In addition, school 
grades and Skill Qualification Test scores have been obtained for 
the sample and analyses are underway. The Air Force, Navy, and 
Marine Corps have been performing similar analyses and the 
results will be applicable to understanding the link between 
schoolhouse performance and on-the-job performance. 

Work is also underway in all of the Services to determine the 
efficacy of performance surrogates for specific purposes. There 
are technical and policy differences related to measuring job 
performance for validating a test and measuring job performance 
for evaluating a training system. Nevertheless, if research 
efforts are successful, it may be possible to use surrogates to 
develop cost-effective field performance feedback procedures that 
could help guide curriculum development. 



RECOhSMENDATION 1: The GAO recommended that tne Assistant Secre- 
tary of Defense (Force Management and Personnel) direct the 
personnel research it coordinates among the individual Services 
to investigate more sensitive predictors of schoolhouse perfor- 
mance for women and minority students from the Armed Services 
Vocational Aptitude Battery data it already possesses, 
(p. 5-4 /GAO Draft Report) 



RECOMMENDATIONS 




ERIC 



Page 100 



GAO/PEMD-ai^ Military Technlcal'Traizang Effectiveness Is Unknown 



Appendix V 

Comments From the Department of Defense 



23 



Pop Response ; Concur. The Office of the Assistant Secretary of 
Defense (Force Management and Personnel) will prepare a memoran- 
dum to the Defense Manpower Data Centsr and the Services request- 
ing that the recommended analyses be performed. We will also 
ensure that research in progress pertaining to computerized 
enlistment test developrient will include analyses to determine 
the sensitivity of the tests as predictors of schoolhouse perfor- 
mance for 9(?nder and minority subgroups. 

RECOMMENDATION 2: The GAO recommended that the Secretary of the 
Army direct the Training and Doctrine Command to review the 
schoolhouse grading procedures identified within the report as 
deficient for their accuracy, appropriateness, and reliability, 
(p. 5-4/GAO Draft Report) 

Pod Response : Concur. The Secretary of the Army will direct the 
Training and Doctrine Command to review the appropriateness of 
Fort Gordon's testing procedures and their compliance with Array 
policy. A plan of action to remedy any existing deficiencies 
will be prepared by August 1990. 

RECOMMENDATION 3: The GAO recommended that the Secretary of the 
Navy establish a firm deadline for developing a training evalua- 
tion program and that he direct that the adequacy of current 
resources allocated to this effort be reexamined. (p. 5-4/GAO 
Draft Report) 

Pod R€.3ponse : Concur. The Navy has several training evaluation 
programs already m place. As mentioned previously, these 
include the Navy Training Appraisal, the Navy Training Require- 
ments Review, the Fleet Training Appraisal Program, the Mainte- 
nance Training Improvement Program and the Training Performance 
Evaluation Board. Additj '^nally, the Chief of Naval Education and 
Traini .g plans to brief, by July 1990, an enhanced integrated 
training feedback system to the Chief of riaval Personnel. A Plan 
of Action and Milestones will be prepared by August of 1990 to 
implement that system. 

RECOHMENPATION 4: The GAO recommended thai, the Assistant Secre- 
tary of Defense (Force Management and Personnel) review alterna- 
tive measures of field performance already developed by the 
Services under the Job Performance Measurement project for poten- 
tial applicaoility to training and on-the-job performance evalua- 
tion, (pp. 5-4 and 5-5/GAO Draft Report) 

P od Respona e: Concur. During the mid-1980s, the DoD explored 
applications of the measures developed in the Joint-Service Job 
Performance Measurement Program to training, while the decision 
made followi.'ig that review was to defer full-scale implementation 
because of cost factors and the fact that techniques for develop- 



"3 



gj^l^" Page 101 GAO/PEMD-91-4 Military Technical-Training Effectiveness Is Unknown 



Appendix V 

Comments From the Department of D^^fense 



24 



ing the performance measures were still being refined, the 
Department will again explore the feasibility of expanding their 
use through the auspices of the Joint-Service Job Performance 
Measurement Working Group. The review is expected to be com- 
pleted following final performance measurement development during 
Fiscal Year 1991. 




ERIC 



Page 102 



GA0/PEMD-9M Military Technical-Training Effectiveness Is Unknown 



/ 



Appendix VI 

Mgjor Contributors to This Report 



Program Evaluation 
and Methodology 
Division 



Michael J. Wargo, Issue Area Director 
Richard T. Barnes, Assistant Director 
Robert E. White, Project Manager 
Kurt R. Kroemer, Project Staff 



O ) 

ERIC 



Page 103 



105 



GA0/PEMD.914 MiUtary Technlcal-Training Effectiveness Is Unknown 



